How do I prevent my Redhat instance from losing SSH access when running spmd or parpool using more than N workers?
3 ビュー (過去 30 日間)
古いコメントを表示
MathWorks Support Team
2023 年 10 月 31 日
回答済み: MathWorks Support Team
2023 年 11 月 2 日
I am trying to run a few simple lines of code that creates a local cluster, runs "spmd", performs a task in a "while" loop and then breaks out of the "while" loop. However, my code fails to run on any of my compute instances if I use more than 44 workers. This is how my code looks:
pclus = parcluster('local');
nc = 64;
parpool(pclus,nc);
i = 0;
while i < 100
spmd
% perform some task here
end
end
After "ii = 38" or so, my other shell instance running "top", "htop", "ls" and other commands fail. Right after this, I lose SSH access to my instance as well. Rescale, the platform on which my instance runs, prints out that this is an "unhealthy instance" and shuts it down.
During this time, I observed that MATLAB continues running. Why does this happen?
採用された回答
MathWorks Support Team
2023 年 10 月 31 日
This issue occurs because of the maximum user processes "ulimit". On Linux instances, it is possible to change the maximum user processes "ulimit" using the command:
ulimit -u 16384
However, this would only change the "ulimit" for that specific shell instance. To change the "ulimit" for all future shell instances as well, you will need to change the maximum user processes "ulimit" in the file "/etc/security/limits.d/20-nproc.conf".
The default value set in your instance must be enough to handle the number of MATLAB processes opened by 44 workers. This is the reason that MATLAB continues to run despite the instance itself losing SSH access.
0 件のコメント
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Parallel Computing Fundamentals についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!