parfor loop errors on AMD cores limits
古いコメントを表示
Hello,
I am trying to run a simple parfor script on nodes on our cluster. The code works fine until I try to use > 46 CPUs (workers) at once, on one server. Some of our latest nodes have 128 AMD cores. I can run up to 56 cores on our Intel CPU servers (nodes) , but on any AMD I get errors (java runtime and others) when using >46 cores. It would be great to use all 128 cores on these new nodes for our MATLAB code. I have tried increasing memory and I still get these errors when using > 46 cores.
I will attach the MATLAB crash dump, code and sbatch files.
My sbatch file (I have tried many, many different parameters) -
#!/bin/bash
#SBATCH -J pfor_matlab
#SBATCH -o pfor".%j".out
#SBATCH -e pfor".%j".err
#SBATCH -t 45:00
#SBATCH -N 1
#SBATCH -p normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
module load matlab
hostname -s
env | egrep SLURM
matlab -nosplash -nodesktop -r "pfor"
The sbatch produces this output in the SLURM .err file-
Error using parpool (line 145)
Parallel pool failed to start with the following error. For more detailed
information, validate the profile 'local' in the Cluster Profile Manager.
Error in pfor (line 5)
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
670)
Failed to initialize the interactive session.
Error using
parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus
(line 781)
The interactive communicating job failed with no message
Thank you for any pointers!
Mark
2 件のコメント
Walter Roberson
2021 年 2 月 9 日
The volunteers are not likely to know the solution for this; you should open a support case.
Mark PIERCY
2021 年 2 月 9 日
採用された回答
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Third-Party Cluster Configuration についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!