James Sorokhaibam

Last seen: 1年以上前 | 2024 年からアクティブ

Followers: 0 Following: 0

統計

Feeds

質問

High fluctuation in Q0 value for TD3 agent while training.
I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy co...

2年弱前 | 1 件の回答 | 0

1

回答