How can I efficiently add multiple arrays generated in a loop?

8 ビュー (過去 30 日間)
zjw z
zjw z 2021 年 3 月 3 日
コメント済み: zjw z 2021 年 3 月 8 日
Hi,
I am trying to visualize a lot of (point, intensity) values as disks and gaussians whose values add up when they overlap.
The following code does what I want, but is very slow, taking several minutes once there are a lot of points. I would like it to work for 1e5 or 1e6 points in a reasonable time:
tic;
Npixels_X = 4096; % image size X
Npixels_Y = 3537; % image size Y
N=10; % number of points
x = randi(Npixels_X,[1,N]); % x coordinates
y = randi(Npixels_Y,[1,N]); % y coordinates
intensity = randi(100, [1,N]); % intensity
r0=100; % radius of circles
[X,Y] = meshgrid(1:Npixels_X, 1:Npixels_Y);
gaussianPoints = zeros(size(X));
circlePoints = zeros(size(X));
for idx=1:N
r_squared = (X-x(idx)).^2 + (Y-y(idx)).^2;
gaussianPoints = gaussianPoints + intensity(idx) * exp(-r_squared./(r0^2));
circlePoints = circlePoints + intensity(idx) * ( r_squared <= r0.^2 );
end
figure;
imagesc(gaussianPoints);
figure;
imagesc(circlePoints);
toc;
Is there a more efficient way of doing this?
I sense that parfor, bsxfun or some array reshaping might help, but am not quite sure how to go about it yet.
  2 件のコメント
weikang zhao
weikang zhao 2021 年 3 月 4 日
This problem is suitable for the use of GPU computing or parallel computing, but parallel computing can not bring a huge improvement, you can also try to use the C language for mixed compilation. If you are not in a hurry for a solution, you can provide me with an email address and I will help you optimize this code when I have free time.
zjw z
zjw z 2021 年 3 月 8 日
I am not in a hurry, but would be happy if you could point me in the right direction to get started with one of the options you mentioned. I prefer getting help here rather than by mail for now.

サインインしてコメントする。

回答 (1 件)

Jan
Jan 2021 年 3 月 4 日
編集済み: Jan 2021 年 3 月 4 日
EXP is very expensive. This is the bottleneck of your code.
Instead of applying it to matrices produced by MESHGRID, provide the vectors instead an use:
exp(a + b) = exp(a) * exp(b)
tic;
r02 = r0^2; % Move repeated squaring out of the loop
X = (1:Npixels_X); % Vectors instead of redundant matrices by MESHGRID
Y = (1:Npixels_Y).'; % Column vector
G = 0; % Slightly cheaper than ZEROS()
C = 0;
for idx = 1:N
RX = (X - x(idx)) .^ 2; % Row vector
RY = (Y - y(idx)) .^ 2; % Column vector
eRX = exp(-RX / r02); % EXP over vectors
eRY = exp(-RY / r02);
G = G + intensity(idx) * eRX .* eRY; % auto-expand: >= Matlab R2016b
C = C + intensity(idx) * ((RX + RY) <= r02); % auto-expand
end
toc
Except for rounding errors the results are equal. On my Core i5 mobile, Matlab 2018b the processing time shrinks from 2.48 seconds to 0.52 seconds.
If the code need 0.5 seconds for N=10, it takes about 14 hours for N=1e6.
You can split the job in parts and run them in parallel.
% Naive PARFOR approach:
N = 100; % Multiple of 10
... as above
r02 = r0^2;
X = (1:Npixels_X);
Y = (1:Npixels_Y).';
pG = cell(1, 10); % Let clients collect the data separately
pC = cell(1, 10);
parfor ip = 1:10
G = 0;
C = 0;
q = 1 + (ip - 1) * (N / 10);
for idx = q:q+(N / 10)-1
RX = (X - x(idx)) .^ 2;
RY = (Y - y(idx)) .^ 2;
eRX = exp(-RX / r02);
eRY = exp(-RY / r02);
G = G + intensity(idx) * eRX .* eRY;
C = C + intensity(idx) * ((RX + RY) <= r02);
end
pG{ip} = G;
pC{ip} = C;
end
G = 0; % Sum up results finally:
C = 0;
for ip = 1:10
G = G + pG{ip};
C = C + pC{ip};
end
This takes the double time on my 2 code i5 - not the half time. Matlab warns, that "intensity", "x" and "y" are expensive broadcast variables. But I do not know how to avoid this.
For a real problem, I'd start as many Matlab instances as I have cores and distribute the data with running the non-PARFOR approach.
To be honest: Maybe others have more success with a parallelization. I did not find any code, which runs faster with PARFOR yet, even on my 4-core i7 desktop. Either PARFOR is not sufficient or my approachs are to naive.
  1 件のコメント
zjw z
zjw z 2021 年 3 月 8 日
Thanks for that. It does save time, although I don't think it will be sufficient for what I had in mind. (<5min rendering) Calculating RX and RY separately indeed makes a lot of sense. :)
I might have a look at GPU options or restrict it to look at a subset of the data.
I also don't need circles and gaussians at the same time (can be a user option), but even with just one of them, it takes too long for 1e5 points.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeMatrix Indexing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by