Parfor Freezing during computation

6 ビュー (過去 30 日間)
Samuel Léveillé
Samuel Léveillé 2019 年 6 月 24 日
コメント済み: John Meluso 2020 年 3 月 17 日
Hi,
im doing an optimization where the function being optimized uses a parfor to speed-up it's calculation.
The said function look something like this:
Data(X=1:10,Y=1:10) (just reference for the data format)
Parfor x=1:10
for y=1:10
dosomething(DATA(x,y)); (uses Quad and Fzeros but i dont think its that important)
end
end
This problem is: the total program takes 3-4 days to compute and while i run the code (on a 16 core xeon server), the program will sometime stall stopping iteration. It can be after 15 minutes or 40+hour... CPU usage drop to zero but no error message. (for reference, i managed to run the entire program a couple time without any issues but i need it to be extremly reliable...). I also see a couple worker popping in and out of the command list but all of them are at 0.1% load. At first i thought it was a probleme with the optimization routine but i accidently discovered that when i kill some worker in the command promt, an error message pop-up saying a worker was aborted and then the program restart iterating! However, it will continue only on the remaining worker i didn't kill. This process was done with trial and error and didn't manage to identify the cause.
Any advice? i tried to feed the dataset with a Parpool constant and calling only the value being used in the specific parfor iteration, to refer the above exemple:
C= parallel.pool.Constant(DATA);
Parfor x=1:10
data(1,:)=C.value(x,:);
for y=1:10
dosomething(data(1,y)); (uses Quad and Fzeros but i dont think its that important)
end
end
But this procedure yielded the same crashes and this time even faster than usual (might be random).
As i said, this problem seem totally random and will sometime not even happen for a particular test run. I tried to work on simulated Data(random) and a time-series (deterministic) and both did this issue. And each time it happened, I stopped it, restart the program and it didn't stall at the same place the previous one did.
PS: it also happend on my personnal laptop (2 cores old stuff), so im pretty sure the problem is'nt from the server i use. In the matlab window, the code stall, the play button stay on pause but no CPU load and no error message.
Thanks
  1 件のコメント
John Meluso
John Meluso 2020 年 3 月 17 日
Hi Samuel, I'm curious if you ever found a reliable solution to this problem? I'm running into the same issue running a simulation on a computing cluster and -- despite the plethora of people who seem to have the same problem -- I haven't seen anyone else offer a solution. Thanks!

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeParallel Computing Fundamentals についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by