Not able to start parpool in multiple different matlab instances simultaneously in a single machine.

24 ビュー (過去 30 日間)
I tried in multiple different matlab instance.
for i = 1:2
str2Eval = [ '!matlab -r "myFunction(''' fileName(i) ''');exit;" &'];
eval(str2Eval);
% This is to create seperate matlab instance to run parallely.
% Inside "myFunction" I used "parfeval" for running some operations parallely.
end
% Now, both matlab instance opened, started working perfectly till parfeval
% and showed error while creating parallel pool.
% (This worked perfectly when previously ran in a single instance.)
% I closed ALL matlab instances, and opened new one.
% tried runing "parpool(2)", it does not work and gives the following error:
Starting parallel pool (parpool) using the 'local' profile ...
Error using parpool (line 145)
Parallel pool failed to start with the following error. For more detailed information, validate
the profile 'local' in the Cluster Profile Manager.
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 670)
Failed to locate and destroy old interactive jobs.
Error using parallel.Cluster/findJob (line 74)
Unknown type: concurrentconcurrent.
I restarted Windows 10. There is no "local_scheduler_data" or "local_cluster_jobs" in "prefdir". Tried to validate from "Cluster profile manager". All test passed except the last one: "Parallel pool test (parpool)". "distcomp.feature( 'LocalUseMpiexec', false )" didn't worked. "Administrator mode" didn't worked.
The college workstation have 32 cores and enough RAM to run my model in parallel. I am just tring to run some commands in parallel which are independent to each other.
  1. How to make "parpool" working again? (solved) by deleting "R2020a" folder inside "local_cluster_jobs" folder from parent directory of "prefdir".
  2. Is it possible to use parpool in multiple MatLab instances runing simultaneously? If yes, how?
  3 件のコメント
jessupj
jessupj 2020 年 9 月 10 日
I think when I did this before (R2014 or thereabouts), I had to define differnet clusters so that the matlab instances (called from shell using GNU parallel) opened independent pools. I didn't try J.Herzog's delay tactic -- that never occurred to me to try.

サインインしてコメントする。

回答 (2 件)

Moritz Schappler
Moritz Schappler 2020 年 11 月 5 日
I also encountered this problem for running multiple parallel instances on different nodes of a PBS computing cluster.
When running about 20 parallel instances (each running a parpool) and starting them all at the same time, this happens nearly always.
You can prevent simultaneous write access which crashes the prefdir by using some kind of synchronization.
I tried to do a simple implementation using a lockfile. Perhaps this is helpful. the command would look like this:
%% start pool with protection of the prefdir
parpool_writelock('lock', 180, true); % wait at most 3 minutes for other parpools starting simultaneously
parpool(Set.general.parcomp_maxworkers);
parpool_writelock('free', 0, true);
%% parallel computation
% ....
%% end pool with protection of the prefdir (not sure if this is necessary)
fprintf(fid, 'parpool_writelock(''lock'', 300, true);\n'); % wait 5 minutes for other parpools to start/stop
fprintf(fid, 'delete(gcp(''nocreate''));\n');
fprintf(fid, 'parpool_writelock(''free'', 0, true);\n');

Moritz Schappler
Moritz Schappler 2021 年 10 月 14 日
Another solution to the problem may be to change the home directory environment variable before starting the parallel instances of Matlab. Depending on the system (Windows/Linux) and configuration (local machine/several cluster nodes), the commands may be different. This is bash code for running Matlab on a computing cluster:
export HOME=$TMPDIR
matlab -nodesktop < script.m > $LOGFILE 2>&1
Every parallel cluster node gets its own temporary directory ("$TMPDIR") in the form "/scratch/7782473.batch.css.lan". If the variable TMPDIR ist not defined, a unique temporary directory should be generated. This directory is deleted at the end of the session in my case and only contains a java.log file. When starting Matlab with the second command, the profile directory is not /home/username/.matlab/R2021a anymore which was accessed in parallel before and caused the file access problems in my case.
I also included this in my scripting environment to upload parallel computing jobs on a PBS cluster.

カテゴリ

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by