matlab -batch jobs are killed on Ubuntu without trace of reason

16 ビュー (過去 30 日間)
Mads
Mads 2023 年 3 月 2 日
コメント済み: Steven Lord 2023 年 3 月 3 日
I have 28 cores and 384GB RAM and I usually deploy 20 - 25 matlab jobs in parallel, like
matlab -batch "Job1" > output1 &
matlab -batch "Job2" > output2 &
etc.
One thing I don't understand is why these jobs after several hours are terminated before being done. The designated output files leave no trace. How can I ever find out what is wrong? If I restart, it continues because it looks to which results already exist, so it is not due to errors in the code.
The second thing relates to the no trace issue. Why does error messages print in the terminal and not into the designated output file? For example, a particular bug the code runs into returns an error message, which if I haven't left my terminal sessions is shown to me in the terminal window. Other programs put everything into the output file.

回答 (1 件)

Steven Lord
Steven Lord 2023 年 3 月 2 日
Are your jobs consuming a lot of memory? I'd check the syslog file to determine if the Out of Memory killer killed MATLAB to try to free up memory for MATLAB. [Yes, I know.]
  2 件のコメント
Mads
Mads 2023 年 3 月 3 日
Thanks.
I couldn't decipher the syslog of yesterday. Found no OOM or "MATLAB" message in there.
Jobs were restarted last night. Still running. Logging the memory by vmstat currently.
So if it was memory, wouldn't MATLAB print that message or would OOM kill before this event. Isn't there a way to port error messages when running these batch jobs into a text file?
Steven Lord
Steven Lord 2023 年 3 月 3 日
I'm not certain but I don't think the OOM killer gives processes the chance to "speak any last words" before it kills them.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeMATLAB Parallel Server についてさらに検索

製品


リリース

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by