Reinforcement Learning with Parallel Computing Query

Question

PB75 2022 年 7 月 26 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1767805-reinforcement-learning-with-parallel-computing-query

コメント済み: Joss Knight 2022 年 9 月 10 日

Hi All,

I am attempting to get parallel computing enabled when I train my RL agent in R2022a. Forgive the basic question regarding parallel computing,as this is my first attempt. My laptop is compatible with a HVIDIA GeForce RTX 3060. I seem to get an issue where the pool goes idle "IdleTimeout", and I had to "restart" pool on several occasions. I left it to run overnight and again it stalled and stopped training. I am not sure what is happening. Any help would be great. I have included some screen grabs below and the code in my RL script, GPU device properties, and the error showing the pool stopped. I did have episode manager open during the simulation, but the learning did not seem the sae as when I originally ran the training on the CPU. Plus in R2022a I am unable to stop training via episode manager.

Thanks in advance,

Patrick

trainingOpts.UseParallel = true;
trainingOpts.ParallelizationOptions.Mode = 'async';
trainingOpts.ParallelizationOptions.StepsUntilDataIsSent = 32;
trainingOpts.ParallelizationOptions.DataToSendFromWorkers = 'Experiences';

The "IdleTimeout" error is shown below:

Episode Manager:

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Joss Knight 2022 年 7 月 27 日

Can you try increasing your pool IdleTimeout? Maybe this tool is spending a long time computing on the client and meanwhile your pool is idle.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

PB75 2022 年 7 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1767805-reinforcement-learning-with-parallel-computing-query#answer_1016220

Hi Joss,

Thanks for taking the time to answer my question. I have un-checked the idle timeout in preferences, however, I encounter this error when I run the script?

2 件のコメント
なしを表示なしを非表示

Joss Knight 2022 年 7 月 27 日

Please try restarting MATLAB, also try converting your script to .m (using Save As) in case Live Script is causing the issue.

In the meantime, I am asking others for help.

Joss Knight 2022 年 7 月 27 日

By the way, it seems the Idle Timeout is occurring simply because your code errored during execution and stopped running. The error is displayed in the Live Script. Then the pool times out and the timeout message is displayed in the command window.

サインインしてコメントする。

Answer 2

PB75 2022 年 7 月 27 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1767805-reinforcement-learning-with-parallel-computing-query#answer_1016300

Hi Joss,

I have done as you recommended, seems the issue may still be there running the .m script also, as alongside the error the episode reward does not change during training, see below screen grab running the .m script.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

PB75 2022 年 9 月 6 日

Hi All,

Any joy an the above error running parallel computing in RL training?

Cheers

Joss Knight 2022 年 9 月 10 日

I think this is too hard to debug with this little information. There is some sort of error happening during training and probably we would need to inspect the logs to determine what happened. Please open a support case.

サインインしてコメントする。