Steady state error in DDPG control

12 ビュー (過去 30 日間)
Ari
Ari 2024 年 12 月 25 日
編集済み: Ari 2025 年 1 月 24 日 7:21
I am trying to make some modifications in Control Water Level in a Tank Using a DDPG Agent example. I want to reduce sample time from 1.0 to 0.5, so I set Ts = 0.5. Consequently, I had to make adjustment on StopTrainingValue, i.e., changed its value from 2000 to 4000. The training process was successfully completed as it can be seen below.
But there is something unexpected happened: this modifications introduce a steady state error (or something similar to) that wasn't there in the original example.
How to overcome this steady state error? Do I need to make additional adjustments, e.g. make changes to the structure of observations, reward function, actor/critic network, StopTrainingCriteria, etc?
Update:
This is the error I get using pre-trained agent (doTraining = false, no change on the original example)
This is the error I get using re-trained agent (doTraining = true, no change on the original example)
  3 件のコメント
Ari
Ari 2024 年 12 月 26 日
編集済み: Ari 2024 年 12 月 26 日
The original reward function is defined as reward = 10(∣e∣<0.1) - 1(∣e∣≥0.1) - 100(h≤0 ∣∣ h≥20) where e = reference - h is the error and h is the height of the water in the tank. I didn't touch this function. It works well in the original example.
Sam Chak
Sam Chak 2024 年 12 月 26 日

I see. This probably implies that changing the sampling time affects the learning efficiency of the RL algorithm in tuning the PI Controller gains.

You may manually adjust the tuning parameter, but night as well use an optimization algorithm like GA or PSO to auto-tune all other hyperparameters in RL.

サインインしてコメントする。

回答 (2 件)

Divyanshu
Divyanshu 2024 年 12 月 26 日
To get the same results and to avoid the error for sample time 0.5, you might have to change 'Tf' as well and set its value to '100'. This will ensure that the 'MaxStepsPerEpisode' parameter of 'rlTrainingOptions' still has the correct value which the example expects.
Since you only tried to modify the sample time, incorrect value of 'MaxStepsPerEpisode' was computed and maybe that can be a reason for the error.
I hope this helps. However, to find the exact root cause of the error, I might the snapshot of the error message and the reproduction steps.
  1 件のコメント
Ari
Ari 2024 年 12 月 26 日
編集済み: Ari 2024 年 12 月 26 日
Maximum number of environment steps to run per episode is set using MaxStepsPerEpisode = ceil(Tf/Ts). So, it's automatically adjusted. Also, there is no error message whatsoever. To reproduce my result: download the example -> change Ts to 0.5 -> change StopTrainingValue to 4000 -> change doTraining to true -> run the simulation. You may add an integrator to see the steady state error.

サインインしてコメントする。


Ari
Ari 2025 年 1 月 24 日 6:54
編集済み: Ari 2025 年 1 月 24 日 7:21
I came up with this solution:
Remove output limit from integrator on observation block
add another pair of layers in actor and critic networks, and use different random seeds. This is the result I obtained

カテゴリ

Help Center および File ExchangeReinforcement Learning についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by