DDPG training curve remains always flat

Question

LUCA MARSEGLIA 2022 年 1 月 31 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1640185-ddpg-training-curve-remains-always-flat

編集済み: Milan Bansal 2024 年 1 月 25 日

The training curve of my agent has always a shape that looks like this:

I tried to change as many parameters as I could but nothing changes, it always appears flat as in the image with the Episode Q0 curve that tends to the average reward. I changed the variance so that

as well as lowered the variance decay rate.

I also set a higher learnrate and varied the parameters of the reward function, but nothing ever changes.

Which could be the possible reasons that cause a learning curve to always have this flat shape?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Milan Bansal 2024 年 1 月 25 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1640185-ddpg-training-curve-remains-always-flat#answer_1397101

編集済み: Milan Bansal 2024 年 1 月 25 日

Hi Luca,

I understand that you want to know the possible reasons for the flat DDPG training curve despite varying the parameters and hyperparameters.

Following are the possible reasons for the a flat training curve:

Reward Function: The reward function may not be providing meaningful gradients or sufficient learning signals to improve its policy.
Normalization issues: Lack of normalization for states, actions, or rewards could lead to training instability.
Gradient Problems:There could be vanishing or exploding gradients within the actor or critic networks.
Reward Scaling: The rewards may not be scaled properly, leading to insignificant updates to the policy.
Learning Rates: The learning rates for the actor and critic might be inappropriate, possibly too low or too high.
Target Network Update Rate: The target networks for the actor and critic may not be updating at a suitable rate.

Following are the possible ways to diagnose and resolve the issue:

Analyze Reward Function: Ensure the reward function provides a clear gradient for the agent to learn effectively.
Hyperparameter Optimization: Experiment with different hyperparameters, including learning rates and the discount factor.
Network Architecture Review: Check the actor and critic network architectures to ensure they are suitable for the complexity of the task.
Agent Options: Try varying the parameters in "rlDDPGAgent" and "rlDDPGAgentOptions" .
Target Update Method: Change the method of updating the target networks.

Please refer to the following documentation links to learn more about "DDPG Agents", "rlDDPGAgent" and "rlDDPGAgentOptions":

Additionally, you can find a related example in the documentation in the following link:

https://in.mathworks.com/help/reinforcement-learning/ug/train-agent-to-control-flying-robot.html

Hope this helps.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

DDPG training curve remains always flat

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

DDPG training curve remains always flat

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示