No convergence during training using an TD3 RL agent

Question

Gaurav 2024 年 4 月 20 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2109391-no-convergence-during-training-using-an-td3-rl-agent

回答済み: Ayush Anand 2024 年 5 月 22 日

I am trying to train an agent to navigate a multirotor to a particular 3d coordinate. I am using an TD3 agent with the configuration same as the Train Biped Robot to Walk using Reinforcement Leaning agent ( Link: Train Biped Robot to Walk Using Reinforcement Learning Agents - MATLAB & Simulink (mathworks.com) ). In my case i have 16 observation space and 4 action space. I have normalized both my observation space before passing it to my agent and the action output by the agent is also normalized between -1 to 1 which i later scale it up while passing it to the multirotor environment.

While training the agent rewards drop to zero all the time. I have attached the training results as an image. In the image you can see that the rewards are between 1000 and zero and the rewards keep droping to zero and the agent can't maintain a constant high reward.

Also the agent is trained using parallel computing.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ayush Anand 2024 年 5 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2109391-no-convergence-during-training-using-an-td3-rl-agent#answer_1461461

The reward continuously droppng to zero suggests that the agent might be struggling with either the complexity of the task, the design of the reward function, or issues related to the training setup. Here are a few potential reasons behind the same:

Reward Function being inadequate: If the reward is sparse ,i.e, infrequent feedback to the agent or the reward scale is inappropriate, the agent will fail to learn properly. Ensure that the reward is shaped properly.
Exploration Strategy : As TD3 benefits from a noise-based exploration strategy make sure that the exploration noise is appropriately scaled so that exploration is smooth without causing erratic behavior.
Learning Parameters: You could try experimenting with learning rates for the actor and critic networks, as well as with different batch sizes and replay buffer capacities. You could also try adjusting the discount factor (gamma) and target update frequency.

You can refer to the following links to explore different options with the "rlTD3" agent in MATLAB:

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

No convergence during training using an TD3 RL agent

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

No convergence during training using an TD3 RL agent

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示