Parallel computing, occasionally get Exception message "Message Catalog MATLAB:load was not loaded from the file"

14 ビュー (過去 30 日間)
I am running two jobs on a cluster, job1 on node1, job2 on node2. Job1 starts a little bit earlier than job2.
Everything is fine for job1. But for job2, sometimes I get the exception message in command line, " Caught "std::exception" Exception message is:
Message Catalog MATLAB:load was not loaded from the file. Please check file location, format or contents ".
When this happens, my job did not stop but it did not do calculation anymore, i.e. it hangs.
I suspect this is due to the following resons:
  1. This is related to the load() function. Actually, I did use load() in my parfor-loop. However, I thought load() is different from fopen(), which needs to be followed by fclose(). So, do I have to take some actions when using load() in parlor-loop?
  2. This is related to linux system. When there are too many open files, this may occur. However, I did not open any file in my parfor-loop.
  3. This is related to linux system and I used too much resources. When I run only a job, this exception message never shows.
Did someone come into this?

採用された回答

Edric Ellis
Edric Ellis 2020 年 12 月 11 日
編集済み: Edric Ellis 2020 年 12 月 11 日
The probable cause of this is the file handle limit. This page: https://www.mathworks.com/help/parallel-computing/recommended-system-limits-for-machintosh-and-linux.html has some instructions. Basically, I think you need to raise the ulimit values on the system.
(The other thing to check is that you aren't opening lots of file handles using fopen and not subsequently fcloseing them)
  1 件のコメント
Xingwang Yong
Xingwang Yong 2020 年 12 月 11 日
Thanks, Edric. I checked the Maximum number of user processes of my node, it is 4096, far smaller than the recommended 23741. But Maximum number of open file descriptors of my node is greater than the recommended one.
I did not open any files using fopen(). I used load() in my parfor-loop.
I'll ask my admin to increase Maximum number of user processes to see if this still occurs.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeParallel for-Loops (parfor) についてさらに検索

製品


リリース

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by