About parallel computation and inter process communication
1 回表示 (過去 30 日間)
古いコメントを表示
Hello all!
There is a piece of code that deals with finding patterns in sequences of strings of varying length. Nothing overly complex - except that the main code includes three loops. Anyway - the basic premise is as follows:
- Load the entire data set (essentially as a cell array) consisting of rows of these sequences.
- Run the main code
- Write the output to a file.
Sequentially this process when running without any parallel directives takes "x" seconds.
Now: if I change this to:
- Load the entire data set
- Start matlabpool
- invoke spmd(n)
- Run the main code.
- Write the output to file.
The run time is approximately "10x"!!
The machine on which this is being run: 12GB RAM, i7 with 6cores etc. etc.
From my understanding, upon invoking spmd (since I just am interested in letting different workers perform the same job on different sets of data), the total data set is automatically divided. So - logically the run time should decrease.
However, while trying to figure this out: I also divided the data set into process specific files which are loaded based on respective "labindex". That also - did not provide any relief nor answers.
I have some background with MPI and F90 so I am assuming that the significantly increased run time with more than one worker is probably due to inter-process communication. If that is so: is there any way to prevent this?
The problem I am trying to solve is a disjointed one. One set of data has no bearing on the other - so there is no real need for one worker to talk to another.
Any insight would be greatly appreciated. This really has me intrigued.
Cheers!
0 件のコメント
回答 (1 件)
Edric Ellis
2014 年 7 月 14 日
What sort of data are you passing into SPMD? Inside SPMD, only distributed arrays are automatically operated on in parallel. For example:
x = rand(5000);
xd = distributed.rand(5000);
spmd
x = x * x; % all workers operate on their own total copy of 'x'
xd = xd * xd; % each worker has a slice of 'xd', and they collaborate
end
3 件のコメント
Edric Ellis
2014 年 7 月 15 日
編集済み: Edric Ellis
2014 年 7 月 15 日
Unless you need the (MPI-style) communication available within SPMD, you might be better off using PARFOR which can automatically divide up your problem. For example:
% build 'c' which is a 50x1 cell array where each cell is 100x100
c = mat2cell(rand(5000, 100), 100 * ones(50,1), 100);
% operate on 'c' in parallel
parfor idx = 1:numel(c)
out{idx} = max(abs(eig(c{idx})));
end
The key to getting PARFOR working in this case is that you index into your cell array ("c" in the above example) using the loop variable - this ensures the data is 'sliced', and therefore can be operated on efficiently in parallel.
参考
カテゴリ
Help Center および File Exchange で MATLAB Parallel Server についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!