Simple parfor loop slow
2 ビュー (過去 30 日間)
古いコメントを表示
In my code I am running a parfor loop which has to compute many matrix inversions of a moderately sized matrix. Below is some code which captures the essenence of my code.
nps = 10; foo = zeros(nps,1);
u = rand(1,6000,'like',1i); v = rand(6000,1,'like',1i);
mat = rand(6000,6000,'like',1i);
tic
parfor (ii = 1:nps,10)
foo(ii) = u * (mat \ v);
end
partime = toc
tic
for ii = 1:nps
foo(ii) = u * (mat \ v);
end
sertime = toc
For some reason the parfor loop is slower than the serial loop. For example with nps = 10 I get sertime = 11.2867, partime = 20.7321. If I inecrease to nps = 100 then sertime = 111.8209, partime = 126.8961. Note, I am running this code on a cluster using matlab parallel server using a slurm profile with 10 workers, (allowing more threads avaliable to each worker didn't help either).
Any thoughts on why the parfor loop doesn't provide the speedup expected?
As a side note in my actual code the matrix changes every loop iteration, but the above code still captures the bahavior I cannot explain.
0 件のコメント
採用された回答
Sam Marshalik
2024 年 8 月 31 日
I don't think ThreadPool will help here. I ran the code in a Process pool and a Thread pool and the runtime was somewhat similar. I also double checked how much data is being sent between the MATLAB client and workers and it is not a lot:
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 576198886.00 614.00
2 576198886.00 614.00
3 576198886.00 614.00
4 576198886.00 614.00
5 576198886.00 614.00
6 576199557.00 1081.00
7 576199557.00 1081.00
8 576198886.00 614.00
Total 4609592430.00 5846.00
ThreadPool can certainly help when working with large data, but I do not think it is the culprit here.
I think the culprit is multi-threading. Running the code serial took me 33 seconds and running it on a single worker with no multi-threading took 78 seconds. This means that some multi-threading is happening behind the scenes.
I think you had the right idea of giving your parallel workers access to more threads. For example, in serial the code took 33 seconds. I then started a single worker and gave it access to 8 threads and that ran in 38 seconds (5 seconds for overhead is reasonable). I think as the problem scales up and you can have more workers with more threads you will get more of a benefit from MATLAB Parallel Server.
P.S. you may want to explore using sliced input variables as your data gets larger, so you can send chunks of data to the workers instead of the entire matrix/array.
0 件のコメント
その他の回答 (1 件)
Ronit
2024 年 8 月 30 日
Hello Manuel,
Since you are working on large complex data, and 10 MATLAB workers, the data must be copied to each of the workers, and the results must be copied back. This takes time.
I would suggest that you set up your workers to be threads, not separate processes. In this way, they use shared memory and data doesn’t need copying. You can do this with parpool(“threads”). This will significantly reduce the execution time of parfor loop.
Please refer to the documentation link of Run MATLAB Functions in Thread-Based Environment for more information:
I hope it helps with your query!
3 件のコメント
参考
カテゴリ
Help Center および File Exchange で Parallel Computing Fundamentals についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!