1 ビュー (過去 30 日間)

I would like to accelerate processing of a large set of radar event data using Parallel Computing. I have a server with 48-cores and 512GB of RAM so all of the data I need to process will fit into the local computer's memory with enough cores to process each independent set of events. The data I want each core to process consists of 8 channels of IQ data which is a matrix of S samples x P pulses -- i.e., each I would like to distribute an 8 x S x P matrix to each worker.

Currently the data is loaded from Nx8 files into an Nx8xSxP matrix which I would like to distribute to N workers. The file reading is actually quite slow since it is done by a single processor so perhaps the first question is whether or I could have each worker load their own Nx8 set of files.

Otherwise, how do I distribute each 8xSxP matrix to my workers?

Edric Ellis
2020 年 6 月 9 日

The best approach probably depends on the operations you need to perform on this Nx8xSxP array. Are the operations that you wish to perform such that you can consider "slices" of the array independently? I.e. can each 8xSxP slice of the array be operated on independently? If so, you could consider an approach like this:

parfor i = 1:N

myData = zeros(8,S,P)

for f = 1:8

% Here, readData reads one file returning a matrix

% of size SxP

myData(f, :, :) = readData(i, f);

end

% Here, "compute" operates on 8xSxP array, giving some result

result(i) = compute(myData);

end

Even with this approach, be aware that the file reading might be slow because of the limitations of the disk hardware you're reading from. It this is a spinning disk, it might actually be counter-productive to try and have multiple workers attempting to read different files simultaneously.

If the operations you need to perform are not as easily "sliced" as in the example above, then it might be better to consider using "distributed arrays".

Edric Ellis
2020 年 6 月 12 日

Assuming a have a very large matrix which is NxCxSxP, is there a way to allocate it and load it such that each of the N workers gets a CxSxP slice of my overall matrix?

If you read the data on the client, then the parfor machinery knows how to copy only the necessary portions of NCSP needed by each worker. But it is a copy, and there's some additional transitory memory overhead there. As Walter so rightly points out, it's more efficient if you "slice" the matrix in the final dimension so that when parfor needs to copy out a bunch of slices to send to the workers, it's copying a contiguous block of memory.

If your workload is well-balanced (i.e. you can expect each slice operation to take the same amount of time, give or take), then you could try an approach using spmd which gives you more control. The idea here is similar to my original parfor suggestion, but spmd lets you co-ordinate things such that only a single worker is accessing the disk at any time. Here's a rough sketch:

N=4; C=2; S=1200; P=600;

spmd

% First, divide up the N pieces across the workers

partition = codistributor1d.defaultPartition(N);

% partition is now a vector of length numlabs specifying

% the number of values of slices each worker takes on.

% We need an offset for each worker - compute this using

% cumsum. The result is a vector telling us how many elements

% the preceding workers own.

nOffsetByLab = [0, cumsum(partition)];

% Next, we can force the workers to operate one at a time

% using labBarrier like so:

for activeLab = 1:numlabs

if labindex == activeLab

% My turn to load data

% The partition tells me how many values to load on this worker

myNumSlices = partition(labindex);

% Allocate my local piece of NCSP

myNCSP = complex(zeros(myNumSlices, C, S, P));

% The offset here tells me what the "global" index is in the

% first dimension

myNOffset = nOffsetByLab(labindex);

% Loop over my slices and "load" the data.

for nIdx = 1:myNumSlices

globalN = nIdx + myNOffset;

myNCSP(nIdx, :, :, :) = globalN .* ones(C,S,P);

end

end

% Force all workers to wait here

labBarrier

end

% At this point, each worker has myNumSlices x C x S x P array

% myNCSP, and can perform computations.

myResult = zeros(myNumSlices, 1);

for nIdx = 1:myNumSlices

myResult(nIdx) = sum(myNCSP(nIdx, :));

end

end

% At the end of the spmd block, myResult is a Composite. We

% can simply concatenate the portions of that to get the overall

% result

overallResult = vertcat(myResult{:});

This is quite a bit more complex than the simple parfor approach, but it ensures no large data transfer, and also that only one worker at a time is "loading" data...

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 4 件のコメント

## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_889823

⋮## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_889823

## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_893966

⋮## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_893966

## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_894113

⋮## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_894113

## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_894368

⋮## このコメントへの直接リンク

https://jp.mathworks.com/matlabcentral/answers/544760-how-do-i-distribute-n-3-dimensional-large-arrays-for-processing-across-n-workers#comment_894368

サインインしてコメントする。