I need to insert multiple 32x32 arrays into 1024x1024 array at predetermined random locations. How to do this efficiently?

3 ビュー (過去 30 日間)
I need to insert many (10^6) 32x32 arrays into single 1024x1024 array and sum them. Insertion location is random, but predetermined. Simple for loops result in long execution time. How can I optimize? I do have parallel processing and gpu available.
Nx=ceil(rand(1e6,1)*(1024-32)); %avoiding partial arrays at edges
Ny=ceil(rand(1e6,1)*(1024-32));
smallArray=ones(32);
largeArray=zeros(1024);
for ii=1:1e6
largeArray(Nx(ii):Nx(ii)+31,Ny(ii):Ny(ii)+31)=largeArray(Nx(ii):Nx(ii)+31,Ny(ii):Ny(ii)+31)+smallArray;
end
  2 件のコメント
Ahmet Cecen
Ahmet Cecen 2016 年 5 月 29 日
On the other hand, I am not entirely sure why this would take any longer than 2 seconds. Is that slow?

サインインしてコメントする。

採用された回答

Joss Knight
Joss Knight 2016 年 6 月 8 日
Use accumarray (and randi):
N = 1024;
Nx = randi([1 (N-32)], 1e5, 1);
Ny = randi([1 (N-32)], 1e5, 1);
linIndStart = sub2ind([N N], Nx, Ny);
smallArray = gpuArray.ones(32);
[I, J] = ndgrid(1:32);
linOffsets = sub2ind([N N], I(:), J(:)) - 1;
linInd = bsxfun(@plus, linIndStart, linOffsets');
values = repmat(smallArray(:)', numel(Nx), 1); % Replace with actual vals
largeArray = accumarray(linInd(:), values(:), [N*N 1]);
largeArray = reshape(largeArray, N, N);
If your submatrices are genuinely all just ones, then just replace the values(:) with 1.
  2 件のコメント
Kevin Hughes
Kevin Hughes 2016 年 6 月 8 日
編集済み: Kevin Hughes 2016 年 6 月 8 日
Joss, Thanks. Actually got slightly better (8.3 vs 9.7) results with:
Nx=1024;
Np=100000;
Ns=32;
smallArray=gpuArray.ones(Ns)+1i*gpuArray.ones(Ns);
smallArrayr=reshape(smallArray,1,Ns^2);
smallArrayrr=gpuArray.ones(Np,1)*smallArrayr;
% [x(:) y(:)] will be a list of the Ns x Ns offset coordinates for
% each box element
[x, y] = meshgrid(1:Ns);
x=gpuArray(x);
y=gpuArray(y);
%random insertion locations
insertX=gpuArray(round(ceil(rand(Np,1)*(Nx-Ns))));
insertY=gpuArray(round(ceil(rand(Np,1)*(Nx-Ns))));
nx=gpuArray(insertX);
ny=gpuArray(insertY);
% Add every offset to every coordinate. The result is 256x16, that
% is one row per box, with 16 elements in each box. Now x and y
% identify every element of every box.
xx = bsxfun(@plus, nx(:), x(:)');
yy = bsxfun(@plus, ny(:), y(:)');
% Now we have arrays that identify what to add together and where
% to put the results. accumarray can do the accumulation based
% on the list of coordinates [x(:) y(:)] and the data going into
% each element bmatrixx(:) and bmatrixy(:).
largeArray = accumarray([xx(:) yy(:)], smallArrayrr(:), [Nx Nx], @sum);
Code found on this site at related link. (Actually your answer to someone else with similar issue) http://www.mathworks.com/matlabcentral/answers/264457-why-for-gpu-is-slower-than-cpu-for-this-code-is-it-because-of-sparsity-or-because-of-for-loop Sets up smallArrays a little differently, but uses accumarray.
Joss Knight
Joss Knight 2016 年 6 月 9 日
Oh well done - I knew I'd answered basically the same question before but I couldn't unearth the old answer!

サインインしてコメントする。

その他の回答 (1 件)

Ahmet Cecen
Ahmet Cecen 2016 年 5 月 29 日
If your small array will always be the same, you might be able to do this lightning fast with a convolution. Instead of using the random ok indices as corners, you would use them as centers. Do something like:
LargeArray(randindexvector)=1;
You can separately find repetitions using unique and add them if you happen to get the same center twice. Much smaller for loop.
Then you convolve with a 16x16 ones matrix.
There might be some rough edges in this approach that needs some smoothing, but if you can get it to work, it would be milliseconds.
  2 件のコメント
Kevin Hughes
Kevin Hughes 2016 年 5 月 29 日
Thanks. Small array is actually complex. As written I get about 20 seconds execution time. But 10^6 could be order of magnitude low and I'll need to repeat with variations on insertion locations a few thousand times which puts total at 160 hours...Not sure assumption of constant small array holds or convolution would be excellent way to go.
The first related question links to a vectorized answer that also shows promise.
Ahmet Cecen
Ahmet Cecen 2016 年 5 月 29 日
You can cascade on stages of uniqueness and vectorize the fill. Say you created 1000 random values, and say 100 of them are duplicates. You first work on the 900, then 100, then any duplicates within the 100 etc. You do something like:
for i = 1:BiggestNumberofRepetations
WORK HERE
end
Hopefully this is an orders of magnitude smaller iteration. This allows you to iterate over the dimensions of the smaller matrix instead of the number of random numbers. I will burst the memory here but you can work around it. It should look something like:
RandInd = sub2ind([1024 1024],Nx,Ny);
for i = 1:32*32
largeArray(RandInd+floor((i-1)/32)*1024)+mod(i,32)) = largeArray(RandInd+floor((i-1)/32)*1024)+mod(i,32)) + smallArray(i);
end
Now there is some nasty indexing there so double check that. Now your iteration count is MaximumRepetationsCount*numel(smallArray), which in most cases should be still orders of magnitude smaller.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeMathematics についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by