Oscillating reward in DDPG using Matlab Reinforcement learning toolbox with simulink environment

Question

Arman Ali 2022 年 9 月 1 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1792365-oscillating-reward-in-ddpg-using-matlab-reinforcement-learning-toolbox-with-simulink-environment

回答済み: Emmanouil Tzorakoleftherakis 2023 年 1 月 25 日

I have 3 observations for each agent in a multi-agent environment.

Similirly, there is one action for each agent between [-1 1].

I am using DDPG agents to train my model. However, the graph for reward is oscillating after some episodes, at first it looks like the reward is converging to a value bit lower than optimal (i.e. episode reward 0 appx) but then it starts oscillating between high values and lower values for reward. The standard deviation for noise is 0.1, decay rate is 0.0001. What are the possible causes? how to improve reward and avoid oscillation? Thank you.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2023 年 1 月 25 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1792365-oscillating-reward-in-ddpg-using-matlab-reinforcement-learning-toolbox-with-simulink-environment#answer_1156405

I think I have mentioned this in another post as well, but you should not expect your episode reward to be monotonic. Once it "converges" around some value for a few episodes, that is a reasonable time to stop training and see if you are happy with the result. If you keep training, the optimization might move to a different direction that leads to worse behavior, similarly to what you are seeing here.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Oscillating reward in DDPG using Matlab Reinforcement learning toolbox with simulink environment

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Oscillating reward in DDPG using Matlab Reinforcement learning toolbox with simulink environment

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示