distributed arrays slow with batch jobs

3 ビュー (過去 30 日間)
Maria
Maria 2021 年 10 月 28 日
コメント済み: Maria 2021 年 11 月 1 日
Hi,
I am working with distributed arrays.
As far as I understood, I can create distributed arrays directly on a cluster. When I want to manipulate what is inside the distributed array, I need to use spmd.
I wanted to avoid any interactive pool. For this reason, I created a function that uses a distributed array, and send it to the cluster as a batch job. The function looks like
function R = my_distributed_function(input)
R = eye(N,'distributed' );
for k = 1 : N
for m = 1 :N
R(k,m) = 1 *m;
end
end
And I send this to the cluster as a batch job
job_distributed = batch(c,@my_distributed_function,1,{myinput},'Pool',N-1,'CurrentFolder','.','AutoAddClientPath',false);
However, it takes very long, around 64 seconds. The function without the "distributed" takes around 2 ms.
If I do not use the batch job, but keep the "distributed" option, the interactive pool starts. Then of course, it takes around 2 seconds, but there is the time to start the parallel pool.
My question is : why the batch job takes so long if I use a function that uses distributed arrays?

採用された回答

Thomas Falch
Thomas Falch 2021 年 10 月 29 日
A batch job with the 'Pool' option ( a "batch-pool job") will end up starting the equivalent of a interactive pool, but using one of the workers as a substitute for the MATLAB desktop client. The overall time for such a job will therefore be pool startup + the acutall work you're doing. In other words, it will take about the same amount of time as an interactive pool.
The main benefit of an batch-pool job is that you can submit the job to the cluster, and then shut down the MATLAB desktop client (and indeed the computer it's running on). Meanwhile, the job is running on the cluster, and you can come back much later to get the results. This is useful for long running jobs which don't require any user input (which is what interactive pools are for).
  3 件のコメント
Thomas Falch
Thomas Falch 2021 年 11 月 1 日
This happens whenever you use the 'Pool' option to batch() (or equivalently using createCommunicatingJob()).
If you use parfor with batch() without the 'Pool' option (or equivalently using createJob/createTask), it will probably not work as you expect. It will not cause any kind of pool to be opened, and it will basically run as a regular for loop (on a single worker of your cluster).
This is the same behavior you would get if you try to run a parfor loop in the MATLAB desktop client without a interactive pool open and you have disabled the option to start up a parpool when you encounter a parfor (or don't have the Parallel Computing Toolbox installed).
Maria
Maria 2021 年 11 月 1 日
Thank you for the clarification. I had completely misunderstood this point. I have some createTasks with some parfor, and I thought that the parfor was going to be executed as parfor...But now that I think well, of course, because createTasks creates the task at worker level, and it is 1 worker per core.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeMATLAB Parallel Server についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by