Unable to start parallel pool for more than 12 cores

9 ビュー (過去 30 日間)
Xiaofan Cui
Xiaofan Cui 2020 年 12 月 19 日
コメント済み: Raymond Norris 2021 年 2 月 13 日
Hi
My Matlab version is 2019a and my server has 8 cpus(Intel(R) Xeon(R) CPU E7- 8860 @ 2.27GH), each cpu has 10 cores with hyperthreading. Hence I thought I can at most set my "preferred number of workers in a parallel pool" to be 80. However, whenever I set my "preferred number of workers in a parallel pool" to be higher than 12, Matlab returns "failed to start parallel pool" to me. This is my cluster profile:
Thanks

回答 (1 件)

Raymond Norris
Raymond Norris 2020 年 12 月 19 日
I'm a bit confused how setting the default size of a parallel pool would throw "failed to start parallel pool", since setting the size in the profile doesn't start a pool. I'm gathering that your Intel E7-8860 has 8 CPUs with 10 cores/socket plus hypertheading (that is, the 10 cores don't reflect the HT). Where are you running your MATLAB client, on your local workstation or on one of the server nodes?
Although you can run a local pool on a single node on the server, I'm wondering if you're running MATLAB on your local workstation, where there are less cores. Run the following in MATLAB on the workstation where you're setting the profile.
feature numcores
The local profile provides the settings for a local pool on the machine where the MATLAB client is running. If you want to run the pool of workers on your 80 core/node server, you either need to run MATLAB directly on the server (and use the 'local' profile) or create a new a new profile in your workstation MATLAB. This new profile would instruct MATLAB how to submit to scheduler (e.g. MJS, Slurm, etc.) on the cluster.
If this sounds about right, contact Technical Support (support@mathworks.com) -- they can walk you through the process of submitting parallel jobs on machines other than your local workstation.
  2 件のコメント
Xiaofan Cui
Xiaofan Cui 2020 年 12 月 19 日
編集済み: Xiaofan Cui 2020 年 12 月 19 日
Thank you so much for your quick reply, Raymond. The problem occured when I am running my code. The "parfor" in my code triggered the parallel pool to start. Then the matlab keeps trying to start the parallel pool (some times can be 1 hour long), and then fails and return me this error.
I guess I am using MATLAB on a server node.
Raymond Norris
Raymond Norris 2021 年 2 月 13 日
If you're running MATLAB on a server node, how many cores did you allocate to it? That is, I'm going to assume you're running under some scheduled environment (e.g. PBS) and if so, can you post your job script? It's possible that you only request 1-2 cores, but the local profile sees 80 and it's contending with other jobs running on the same node.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeThird-Party Cluster Configuration についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by