terminating a parfor loop early

18 ビュー (過去 30 日間)
David Spence
David Spence 2022 年 2 月 2 日
コメント済み: David Spence 2022 年 2 月 2 日
Like lots of people before me, I'm looking for a way to get something like Break functionality in a parfor loop.
My code is an embarrasingly-parallel montecarlo code. I'm submitting runs with different input parameters for execution in a job queue. When I submit my job I specify a maximum runtime; if I accidentally submit a job that takes longer, my job terminates and I lose everything
I'd like to inside the parfor loop to check if I've exceed some specified runtime, and if so get out of the loop quickly so that the code can save to file the data it has already accumulated, rather than losing it all when the job is killed.
My plan was to do something below, so that if the loop takes longer than maxruntime, the loop will effectively be empty and we will quickly run through any remaining iterations.
My problem is that using datetime seems to be extremely slow.
Is there a better way to do what I want, or a faster way of checking time across different cores?
maxruntime=hours(4); %set maximum runtime to 4 hours
starttime=datetime; %save the time at which we start the parfor
parfor packets=1:N
if (datetime-starttime)<maxruntime
%the normal loop code goes in here
%arrays are accumulated
end
end
%code to write out the accumulated arrays from the parfor loop is here

採用された回答

Raymond Norris
Raymond Norris 2022 年 2 月 2 日
The issue you'll have with parfor is that it can't terminate early. I believe what you're suggesting is to do checkpointing -- having MATLAB write to a file(s) sporadically. MATLAB doesn't have this, but looks like you're trying to explicitly do this. This can be useful to requeue the job and start further downstream or to have a minimal set of output before a job bails, etc.
My suggestion is to use parfeval. This allows for early termination of "futures" (i.e., tasks).
maxruntime=hours(4); %set maximum runtime to 4 hours
starttime=datetime; %save the time at which we start the parfor
for packets = 1:N
f(packets,1) = parfeval(@unit_of_work, ..);
end
for packets = 1:N
[idx, ..] = f(packets).fetchNext();
if (datetime-starttime)<maxruntime
continue
else
% About to run out of time, cancel all futures
f.cancel
end
end
%code to write out the accumulated arrays from the parfor loop is here
Looks like Rik has an improvement with now vs datetime you could use as well.
Obviously, this is a different approach then parfor, but it gives you the flexibility to end the loop early. You'll need to aggregate your results from fetchNext.
Another thought is to use DataQueue. doSomething will very quite a bit, depending on what you want to do with your aggregated array, but it's a starting point.
q = parallel.pool.DataQueue;
afterEach(q, @doSomthing);
parfor packets=1:N
if (datetime-starttime)<maxruntime
%the normal loop code goes in here
%arrays are accumulated
q.send(<the variable you want to accumulate>)
end
end
function doSomething(D)
% do something with D (e.g, write to file, etc.)
end
  1 件のコメント
David Spence
David Spence 2022 年 2 月 2 日
You're right - I need 'checkpointing'! I can actually break the main parfor loop into a sequence of smaller parfors, and write out data to file after each smaller loop. This solves my problem I think.

サインインしてコメントする。

その他の回答 (1 件)

Rik
Rik 2022 年 2 月 2 日
I personally use the now function a lot. The number it returns is in days, so you will have to scale your max time to fractional days for the comparison.
  2 件のコメント
Rik
Rik 2022 年 2 月 2 日
I doubt using toc with an input would be much faster, but you could try:
maxruntime=seconds(hours(4)); %set maximum runtime to 4 hours
starttime=tic; %save the time at which we start the parfor
parfor packets=1:N
if toc(starttime)<maxruntime
%the normal loop code goes in here
%arrays are accumulated
end
end
%code to write out the accumulated arrays from the parfor loop is here
It all depends on how much you're doing in the rest of your loop. If it is fast, then even a low-cost function will result in a drastic performance decrease.
I don't know many more strategies to query the system time. I believe using a mex doesn't beat calling now.

サインインしてコメントする。

タグ

製品


リリース

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by