Why is the DDPG episode rewards never change during the whole training process?

Question

Guoge Tan 2020 年 5 月 25 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process

コメント済み: Shahriar 2022 年 6 月 29 日

I'm training a DDPG agent using the Reinforcement Learning toolbox on MATLAB R2020a for a path planning problem. But as you can see, the DDPG episode rewards and average rewards never change during 5000 episodes. I used a simple neural networks with 20 neurons and three layers, the learning rate is set to 0.01, and the Gradient Threshold is 1. Then I try to set weights and bias for fully connected layers and change my reward function, but the result is the same.

I also saw at here that others have a similar problem. So any advice for my problem? Thank you.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Shahriar 2022 年 6 月 29 日

@Guoge Tan could you solve this issue? I have a similar situation.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2020 年 5 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process#answer_439593

Looks like the scale between Q0 and episode reward is very different. Try unchecking "Show Episode Q0" to see of the episode reward changes. I would then simplify the critic network to make sure it outputs values in a similar scale as the episode reward.