MATLAB Answers

Using parfor instead of for

1 ビュー (過去 30 日間)
Jeremy 2014 年 10 月 22 日
コメント済み: Jeremy 2014 年 10 月 23 日
So I'm trying to look at the difference between CPU and GPU program runtimes using a parallel computational scheme. My simple file is attached, but whenever I use parfor with the most computational-heavy area, the runtime is significantly increased (1.8s to 477s). How else can I help speed up the code by implementing parallelization? Is my code too simple to see a performance increase?
  2 件のコメント
Jeremy 2014 年 10 月 23 日
I make sure that the parpool is up and running before I implement the code. Once I get the code more or less finalized, I'll probably do like you said and make an if statement, making sure the parpool is up and running before any of the timers start.



Bruno Pop-Stefanov
Bruno Pop-Stefanov 2014 年 10 月 23 日
編集済み: Bruno Pop-Stefanov 2014 年 10 月 23 日
Dividing the work into several jobs and sending the jobs to the workers is very expensive. This is done at the line with parfor, before the loop starts. You're right in that there is no point to parallelize the code if this overhead is greater than the time needed to run the loop in serial.
I ran the code with for instead of parfor (it took me 308 sec for just x=3 and parfor was taking way too long to let it run to completion) and counted 6277 iterations of the while loop enclosing the parfor loop. That means that Parallel Computing Toolbox has to divide the loop and send the work to the workers 6277 times. That's a lot...
It's better to divide the work at a higher level, i.e. above the while loop. For example, you could do x=1, x=2, and x=3 on three workers instead of doing it in serial. Instead of taking 3 times 308 s, it should take just above ~308 s:
spmd % instead of for x=1:3
x = labindex;
Also, it would be nice to get rid of the inner for loop for j=2:N. You could see a speedup if you can vectorize this for loop somehow.
Here is more on vectorization: Vectorization
And more about performance in general: Techniques for Improving Performance
  1 件のコメント
Jeremy 2014 年 10 月 23 日
I should have specified this, but the parallelization is to speedup the run when x=1, when x=2, when x=3, etc. So I want the fastest (or most parallel) solution for running x=1. I will definitely take a look into the vectorization (still relatively new to MatLab coding), and I've managed to create a function for the A, D, and un calculations which also saved some time.


その他の回答 (1 件)

Matt J
Matt J 2014 年 10 月 23 日
編集済み: Matt J 2014 年 10 月 23 日
There doesn't appear to be any good reason to make u a cell array. It looks like it could be made into a simple matrix with elements u(i,j). The same is true for A, D, and un below.
It also doesn't look like you even need either parfor or for loops to compute these expressions. They all involve expressions that are either convolutions, or can be vectorized, e.g.
D=conv2(u,[0 1 0; 1 -4 1; 0 1 0],'same')/h^2;
  1 件のコメント
Jeremy 2014 年 10 月 23 日
I originally had everything as a simple matrix, but whenever I would use parfor I would get errors about the code not being able to run in parallel with a simple matrix. For some reason, whenever I switch everything over to a cell array, the calculations run through just fine. I will definitely look into the vectorization like you suggested. Thanks!


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by