Accelerating code wont work

1 回表示 (過去 30 日間)
Marc Laub
Marc Laub 2023 年 11 月 8 日
回答済み: Thomas Ibbotson 2023 年 11 月 9 日
Hey guys,
I got a piece of code, that has to run repetitive and I would be glad if I could reduce execution time, since I do not know how complex the whole thing might become in the future...
For those who care about what it does: it solves diffusion equation für spherical particals that additionaly shrink and grow whil some diffuses in or out.
Theoretically, I just thought parfor would already help, but parfor actually tikes like 4 times longer, even though I already reduced broadcast etc...
D = 1e-18;
r_p=200*10^-9;
xmesh = 0:10^-9:r_p;
N = numel(xmesh);
r = xmesh;
dr = xmesh(2)-xmesh(1);
dt = 0.5*dr^2/D;
tspan = [0:dt:100];
% helper
g = D*dt/(2*dr^2);
% vectorized sparse construction of matrices
offdiags = ...
+ sparse(2:(N-1),1:(N-2),+(dr./r(2:(end-1))-1)*g , N,N) ...
+ sparse(2:(N-1),3:(N ),-(dr./r(2:(end-1))+1)*g, N,N);
ML = sparse(2:(N-1),2:(N-1), ones(N-2,1) + 2*g , N,N) + offdiags;
MR = sparse(2:(N-1),2:(N-1), ones(N-2,1) - 2*g , N,N) - offdiags;
% boundary conditions
% ML(1, 1:3 ) = [-3/2 , 2 , -1/2]/dr;
% ML(N,(N-2):N) = [ 1/2 , -2 , 3/2]/dr;
radi=round(15*lognrnd(0,0.4,2000,1));
radi(radi>100)=100;
u=cell(2000,1);
for i=1:2000
u{i}= 0*ones(radi(i),1);
u{i}(end)=1;
end
c_eq=1;
R=cellfun('length',u)*10^-9;
N=numel(u);
dx=zeros(2000,1);
close all
figure
time_id=1;
% ML = parallel.pool.Constant(ML);
% MR = parallel.pool.Constant(MR);
tic
while time_id<(numel(tspan)-1)
time_id=time_id+1;
c_eq=c_eq*0.99;
% parfor k=1:numel(u)
for k=1:numel(u)
dx(k)=dx(k)+2*10^-17*dt*1/R(k);
R(k)=R(k)+2*10^-17*dt*1/R(k);
if dx(k)>=10^-9
dx(k)=dx(k)-10^-9;
u{k}(end+1)=c_eq;
N=numel(u{k});
elseif dx(k)<=-10^-9
dx(k)=dx(k)+10^-9;
u{k}(end)=[];
u{k}(end-1)=c_eq;
N=numel(u{k});
else
N=numel(u{k});
end
% ML_n = spalloc(N,N,3*N-1);
% MR_n = spalloc(N,N,3*N-1);
% ML_n=ML.Value(1:N,1:N);
% MR_n=MR.Value(1:N,1:N);
ML_n=ML(1:N,1:N);
MR_n=MR(1:N,1:N);
ML_n(1, 1:3 ) = [-3/2 , 2 , -1/2]/dr;
ML_n(N,(N-2):N) = [ 1/2 , -2 , 3/2]/dr;
up = u{k};
d=MR_n*up;
u{k}=ML_n\d;
u{k}(end)=c_eq;
end
end
toc
So up to the tic command, just hat to be setup once. Since the particles have different sizes, the matruces ML and MR have different sizes for each particle. To avoid storing many different matrices and changing the size of those matrices as the particles grow or shrink, I calculatet a large matrix (for a particle size that wont be reached within the simulation) nd then just take the matrix I need for each particle, as submatrices.
So that large matrix and the cell_array with the concentration profile for each particle are the onle broadcast variables, and they do not seem very big to me.... I tried to implement constant variables for all workes, try to preallocate the non zero values for the sparse sub-matrices... everything just made it worse...
So is there any idea left to get some more performance out of it?
Many thanks in advance
Best regards

回答 (1 件)

Thomas Ibbotson
Thomas Ibbotson 2023 年 11 月 9 日
The matrix operations you are using (mldivide and matrix multiplication) are already implicitly multi-threaded and will use all the CPUs on your system. This means parfor cannot improve on this on your local machine and in general will slow things down due to the extra communication and process overheads. See https://uk.mathworks.com/discovery/matlab-multicore.html
As for how to make this faster, you could scale up to multiple machines using MATLAB Parallel Server. However, whether or not this will speed things up depends on whether the computation time for each loop is long compared to communication overheads. Also with MATLAB Parallel Server, each worker is restricted to a single computational core so you lose the benefits of implicit multi-threading, which means you need to scale up enough to overcome this limitation.
For alternatives to improving this on your local machine you would probably need to look at algorithmic improvements, which unforunately I can't help with as I'm not a numerical algorithms expert. You could also profile the code using the MATLAB profiler to see where the bottlenecks are. However, I suspect it's unlikely that the bottlenecks will be outside the matrix computations, but you should check just in case.

カテゴリ

Help Center および File ExchangeLoops and Conditional Statements についてさらに検索

製品


リリース

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by