Measures to improve computation time with reinforcement learning block in Simulink

Question

Enrico Anderlini 2019 年 12 月 13 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/496460-measures-to-improve-computation-time-with-reinforcement-learning-block-in-simulink

編集済み: Emmanouil Tzorakoleftherakis 2020 年 1 月 27 日

I am using the reinforcement learning toolbox to run control tasks, in particular using the DDPG agent. Unfortunately, each episode lasts 100 seconds with a 0.01 s time step (the control time step is 0.1 s, i.e. the RL control block is called that often). The computation time is unfortunately unamangeably high.

I have tried to reduce the training of the actor and critic neural networks to every 5 episodes by using a periodic TargetUpdateMethod and changing the TargetUpdateFrequency. However, by doing a deeper analysis, it is clear that it the computational time taken by each episode, which is too high. So, this is pointing the culpript to the RL Simulink block.

The way I see it, the block should run the neural networks (which is a matrix multiplication) and store the additional experience point in the memory (so some more matrix calculations, if the memory is full). So, this is not fully explaining the large overhead to me.

My code is running (more) efficiently on Python, so it is clear I am not fully exploiting the MATLAB/C++ implementation.

Any advice on how I could try to improve the computational efficiency?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Emmanouil Tzorakoleftherakis 2020 年 1 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/496460-measures-to-improve-computation-time-with-reinforcement-learning-block-in-simulink#answer_412231

編集済み: Emmanouil Tzorakoleftherakis 2020 年 1 月 27 日

Hi Enrico,

Changing the values of TargetUpdateMethod and TargetUpdateFrequency will not change how often training happens, but only how often the actor and critic copies are synced (remember DDPG is an off-policy method, so it keeps two copies of the actor and the critic).

If you look at the algorithm description here, you will see that learning happens at steps 6 and 7, and these happen at each time step (0.1s in your example), which is why you see this slowdown. So the quick things to try are 1) increase sample time, 2) reduce episode duration and 3) reduce size of mini-batch.

One additional thing to try is to parallelize training. You can use Parallel Computing Toolbox for that, and to set this up, you pretty much need to set a flag in training options (see e.g. here).

We are also working on adding more training algorithms for continuous action spaces that are more sample efficient, so I would check back when R2020a goes live.