RL DDPG Actions have high oscillation

10 ビュー (過去 30 日間)

Ahmad Al Ali 2023 年 11 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2044812-rl-ddpg-actions-have-high-oscillation

コメント済み: Sourabh 2023 年 12 月 15 日

Hello, I am using the DDPG Reinforcement learning toolbox in matlab to train a 3DOF robotic arm to move. the actions are joint torques, and although the actions reach the target, they are highly oscillating and noisy.

Can anyone help explane where this comes from ? ie: the algorithm itself, noise options ....

I am using the walking robot example to build noise options:

%% DDPG Agent Options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = 0.025;
agentOptions.DiscountFactor = 0.99;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 5e5;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.NoiseOptions.MeanAttractionConstant = 0.5;
agentOptions.NoiseOptions.Variance = 0.3;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;

i think it might have something to do with MeanattractionConstant, varience, or varience decay. (by the way, the joint limits are between -3,3).

the actions i get look like this :

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

回答 (1 件)

Emmanouil Tzorakoleftherakis 2023 年 11 月 9 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2044812-rl-ddpg-actions-have-high-oscillation#answer_1349950

Hi,

The noise options you are mentioning are only used during training and are essential for exploration. If the plots you are showing above are from training, you may consider reducing the noise variance a bit.

If the plots you are showing are from the trained agent, you can consider penalizing large action changes in your reward signal. That would help reduce the oscillatory content.

Hope this helps

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

Ahmad Al Ali 2023 年 11 月 13 日

MATLAB Online で開く

thank you, I will try the absolute value of the difference between current action and previous action.

1) yes an algebriac loop, it still works but takes longer time to compute. The robot is modeled as rigid bodies and joints inbetween; the joint block is actuated by torque input.

2) the issue he has was with oscillations not the loops sorry.

3) i think each 1 timestep output from agent is calculated for more than 1 timestep in simulink.

the agents output timestep is 0.025s

agentOptions.SampleTime = 0.025;

( however when i run the simulink options with a fixed timestep it gives me an error so i change it back to variable timesteps and it works.

but this causes the simulink timesteps to not be exaclty 400 ( 10 seconds simulation/0.025s per timestep =400 timesteps). i think this difference is what might cause oscillations.

-either way i will try a torque penalty and see what happenes, thank again for answering all my questions, i really appreciate it.

Ahmad Al Ali 2023 年 12 月 14 日

@Sourabh I use a Rate Transition block in simulink, before inputting in the obsercations to the agent:

Sourabh 2023 年 12 月 15 日

Actually i have a signal and i want to sample that signal at interval of 4 sec to make a array and then feed that array to my observation. Can i do it using rate transition block

サインインしてコメントする。

サインインしてこの質問に回答する。

カテゴリ

AI and Statistics Deep Learning Toolbox Applications Autonomous and Control Systems Reinforcement Learning

Help Center および File Exchange で Reinforcement Learning についてさらに検索

製品

リリース

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by