DDPG training curve remains always flat

3 ビュー (過去 30 日間)
LUCA MARSEGLIA
LUCA MARSEGLIA 2022 年 1 月 31 日
編集済み: Milan Bansal 2024 年 1 月 25 日
The training curve of my agent has always a shape that looks like this:
I tried to change as many parameters as I could but nothing changes, it always appears flat as in the image with the Episode Q0 curve that tends to the average reward. I changed the variance so that
as well as lowered the variance decay rate.
I also set a higher learnrate and varied the parameters of the reward function, but nothing ever changes.
Which could be the possible reasons that cause a learning curve to always have this flat shape?

回答 (1 件)

Milan Bansal
Milan Bansal 2024 年 1 月 25 日
編集済み: Milan Bansal 2024 年 1 月 25 日
Hi Luca,
I understand that you want to know the possible reasons for the flat DDPG training curve despite varying the parameters and hyperparameters.
Following are the possible reasons for the a flat training curve:
  • Reward Function: The reward function may not be providing meaningful gradients or sufficient learning signals to improve its policy.
  • Normalization issues: Lack of normalization for states, actions, or rewards could lead to training instability.
  • Gradient Problems:There could be vanishing or exploding gradients within the actor or critic networks.
  • Reward Scaling: The rewards may not be scaled properly, leading to insignificant updates to the policy.
  • Learning Rates: The learning rates for the actor and critic might be inappropriate, possibly too low or too high.
  • Target Network Update Rate: The target networks for the actor and critic may not be updating at a suitable rate.
Following are the possible ways to diagnose and resolve the issue:
  • Analyze Reward Function: Ensure the reward function provides a clear gradient for the agent to learn effectively.
  • Hyperparameter Optimization: Experiment with different hyperparameters, including learning rates and the discount factor.
  • Network Architecture Review: Check the actor and critic network architectures to ensure they are suitable for the complexity of the task.
  • Agent Options: Try varying the parameters in "rlDDPGAgent" and "rlDDPGAgentOptions" .
  • Target Update Method: Change the method of updating the target networks.
Please refer to the following documentation links to learn more about "DDPG Agents", "rlDDPGAgent" and "rlDDPGAgentOptions":
Additionally, you can find a related example in the documentation in the following link:
Hope this helps.

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by