Reasons for bad training performance using prioritized experience replay compared to uniform experience replay using DDPG agent

Question

Gaurav 2024 年 8 月 7 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2143804-reasons-for-bad-training-performance-using-prioritized-experience-replay-compared-to-uniform-experie

コメント済み: Pavl M. 2024 年 10 月 13 日

I am currently trying to use prioritized experience replay while training DDPG agent on Quadruped robot (Quadruped Robot Locomotion Using DDPG Agent - MATLAB & Simulink (mathworks.com) ) instead of uniform experience replay to have faster training time. But while training with prioritized experience replay i notice considerable variance and instability while training compared to uniform replay buffer. This can be seen with frequent spikes and drops in the training monitor. The image below is when i use prioritized experience replay with following parameters

agentOptions = rlDDPGAgentOptions();

agentOptions.SampleTime = Ts;

agentOptions.DiscountFactor = 0.99;

agentOptions.MiniBatchSize = 256;

% agentOptions.ExperienceBufferLength = 1e6;

agentOptions.TargetSmoothFactor = 1e-3;

agentOptions.MaxMiniBatchPerEpoch = 200;

agentOptions.NoiseOptions.StandardDeviation = 0.1;

agentOptions.NoiseOptions.MeanAttractionConstant = 1.0;

agentOptions.ActorOptimizerOptions.Algorithm = "adam";

agentOptions.ActorOptimizerOptions.LearnRate = 1e-3;

agentOptions.ActorOptimizerOptions.GradientThreshold = 1;

agentOptions.CriticOptimizerOptions.Algorithm = "adam";

agentOptions.CriticOptimizerOptions.LearnRate = 1e-3;

agentOptions.CriticOptimizerOptions.GradientThreshold = 1;

initOpts = rlAgentInitializationOptions(NumHiddenUnit=256);

agent = rlDDPGAgent(obsInfo,actInfo,initOpts,agentOptions);

agent.ExperienceBuffer = rlPrioritizedReplayMemory(obsInfo,actInfo);

resize(agent.ExperienceBuffer,1e6);

agent.ExperienceBuffer.NumAnnealingSteps = 1e4;

agent.ExperienceBuffer.PriorityExponent = 0.6;

agent.ExperienceBuffer.InitialImportanceSamplingExponent = 0.4;

And the image below is when i use uniform experience replay for your comaparision.

Therefore im not sure why this is happening. I even tried with different hyperparameters for the prioritized experience replay but observe similar training results. Any posible solution would be really helpful.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Kaustab Pal 2024 年 8 月 8 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2143804-reasons-for-bad-training-performance-using-prioritized-experience-replay-compared-to-uniform-experie#answer_1496354

Hi @Gaurav

Prioritized Experience Replay (PER) tends to outperform uniform experience replay in environments where rewards are sparse, delayed, or where the dynamics are non-stationary. However, in the case of quadruped robot locomotion, the rewards are consistent. We receive positive rewards at every time-step to avoid early termination and additional rewards for positive forward velocity.

Since rewards are given consistently at every time-step, the variance in the importance of different experiences is lower. This means that each experience contributes relatively equally to the learning process, making uniform sampling sufficient for effective learning. The continuous nature of the rewards also ensures that the agent receives regular feedback about its performance, reducing the need for prioritizing specific experiences.

Because of these reasons, you can observe that the reward plot with uniform experience replay is smoother compared to the reward plot using prioritized experience replay.

Hope this clears your doubt.

With regards,

Kaustab Pal

2 件のコメント
なしを表示なしを非表示

Gaurav 2024 年 8 月 8 日

Hi Kaustab,

Thank you for the detailed explanation. I have a further question regarding the use of Prioritized Experience Replay (PER). While I understand that PER may not always outperform uniform sampling in certain cases, I was under the impression that it should at least have a similar training performance. However, in my situation, when I use PER, the training rewards remain consistently low, with sudden drops in performance, which I find difficult to understand. Could you clarify why this might be happening?

Additionally, could you suggest other approaches to accelerate training or potential modifications to the current simulation that might better demonstrate the advantages of PER?

Thank you!

Pavl M. 2024 年 10 月 13 日

Can you explain consistently what next parameters means:

agent.ExperienceBuffer.PriorityExponent = 0.6;

agent.ExperienceBuffer.InitialImportanceSamplingExponent = 0.4;

?

サインインしてコメントする。

Reasons for bad training performance using prioritized experience replay compared to uniform experience replay using DDPG agent

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Reasons for bad training performance using prioritized experience replay compared to uniform experience replay using DDPG agent

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示