MATLAB Answers

Restart a parpool worker

13 ビュー (過去 30 日間)
Raghavasimhan Thirunarayanan
Raghavasimhan Thirunarayanan 2020 年 6 月 16 日
回答済み: Edric Ellis 2020 年 6 月 16 日
Hello,
When I run parfor, sometimes a worker terminates with some error and the simulation continues with the remaining workers. But is there a way to automatically restart the parpool worker without having to stop and relaunch the simulation? I am at my wits end as to how to achieve it.
Thanks

  1 件のコメント

Mohammad Sami
Mohammad Sami 2020 年 6 月 16 日

サインインしてコメントする。

回答 (1 件)

Edric Ellis
Edric Ellis 2020 年 6 月 16 日
There's no simple way to do this when using parfor with parpool unfortunately. I can think of a couple of workarounds that might help, depending very much on how your problem is set up.
Firstly, you could try the "cluster parfor" approach where you don't launch a parpool at all, and instead let the cluster run the loop directly. This is described in the doc here: https://www.mathworks.com/help/parallel-computing/parforoptions.html (See the section "Run parfor on a Cluster Without a Parallel Pool"). This approach launches independent tasks on your cluster rather than a parallel pool. This will only get decent performance if the time taken to launch the workers for the independent tasks is not significant compared to the time taken to run the entire loop. If it works for you, this is highly likely to be the simplest approach.
Secondly, if you can restructure your code to use parfeval instead of parfor, you could check the NumWorkers property of the parallel pool while consuming results, and if it decreases, restart the pool. This would be a bunch more work because you'd need to keep track of the incomplete work, and you'd have to re-submit it.
A third approach might be to restructure your parfor loop to send its results back using a DataQueue . Also, by launching the parpool using the 'SpmdEnabled', true option, the pool will automatically shut down any time a worker crashes. Then, the idea would be that the client stores the partial results of your loop using the DataQueue. The parfor loop would terminate with an error when a worker crashes, but you'd have the partial results and therefore would be able to re-start a new pool, and run a parfor loop over the incomplete iterations.

  0 件のコメント

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by