Accelerating code wont work

Question

0 投票

Hey guys,

I got a piece of code, that has to run repetitive and I would be glad if I could reduce execution time, since I do not know how complex the whole thing might become in the future...

For those who care about what it does: it solves diffusion equation für spherical particals that additionaly shrink and grow whil some diffuses in or out.

Theoretically, I just thought parfor would already help, but parfor actually tikes like 4 times longer, even though I already reduced broadcast etc...

D = 1e-18;
r_p=200*10^-9;
xmesh = 0:10^-9:r_p;
N = numel(xmesh);
r = xmesh;
dr = xmesh(2)-xmesh(1);
dt = 0.5*dr^2/D;
tspan = [0:dt:100];
% helper
g = D*dt/(2*dr^2);
% vectorized sparse construction of matrices
offdiags = ...
    + sparse(2:(N-1),1:(N-2),+(dr./r(2:(end-1))-1)*g , N,N) ...
    + sparse(2:(N-1),3:(N  ),-(dr./r(2:(end-1))+1)*g, N,N);
ML = sparse(2:(N-1),2:(N-1), ones(N-2,1) + 2*g , N,N) + offdiags;
MR = sparse(2:(N-1),2:(N-1), ones(N-2,1) - 2*g , N,N) - offdiags;
% boundary conditions
% ML(1,  1:3  ) = [-3/2 ,  2 , -1/2]/dr;
% ML(N,(N-2):N) = [ 1/2 , -2 ,  3/2]/dr;
radi=round(15*lognrnd(0,0.4,2000,1));
radi(radi>100)=100;
u=cell(2000,1);
for i=1:2000
    u{i}= 0*ones(radi(i),1);
    u{i}(end)=1;
end
c_eq=1;
R=cellfun('length',u)*10^-9;
N=numel(u);
dx=zeros(2000,1);
close all
figure
time_id=1;
% ML = parallel.pool.Constant(ML);
% MR = parallel.pool.Constant(MR);
tic
while time_id<(numel(tspan)-1)
    time_id=time_id+1;
    c_eq=c_eq*0.99;
%  parfor k=1:numel(u)
    for k=1:numel(u)
        dx(k)=dx(k)+2*10^-17*dt*1/R(k);
        R(k)=R(k)+2*10^-17*dt*1/R(k);
        if dx(k)>=10^-9
            dx(k)=dx(k)-10^-9;
            u{k}(end+1)=c_eq;
            N=numel(u{k});
        elseif dx(k)<=-10^-9
            dx(k)=dx(k)+10^-9;
            u{k}(end)=[];
            u{k}(end-1)=c_eq;
            N=numel(u{k});
        else
            N=numel(u{k});
        end
%                       ML_n = spalloc(N,N,3*N-1);
%                       MR_n = spalloc(N,N,3*N-1);
        %               ML_n=ML.Value(1:N,1:N);
        %               MR_n=MR.Value(1:N,1:N);
        ML_n=ML(1:N,1:N);
        MR_n=MR(1:N,1:N);
        ML_n(1,  1:3  ) = [-3/2 ,  2 , -1/2]/dr;
        ML_n(N,(N-2):N) = [ 1/2 , -2 ,  3/2]/dr;
        up = u{k};
        d=MR_n*up;
        u{k}=ML_n\d;
        u{k}(end)=c_eq;
    end
end
toc

So up to the tic command, just hat to be setup once. Since the particles have different sizes, the matruces ML and MR have different sizes for each particle. To avoid storing many different matrices and changing the size of those matrices as the particles grow or shrink, I calculatet a large matrix (for a particle size that wont be reached within the simulation) nd then just take the matrix I need for each particle, as submatrices.

So that large matrix and the cell_array with the concentration profile for each particle are the onle broadcast variables, and they do not seem very big to me.... I tried to implement constant variables for all workes, try to preallocate the non zero values for the sparse sub-matrices... everything just made it worse...

So is there any idea left to get some more performance out of it?

Many thanks in advance

Best regards

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Thomas Ibbotson 2023 年 11 月 9 日

0 投票

The matrix operations you are using (mldivide and matrix multiplication) are already implicitly multi-threaded and will use all the CPUs on your system. This means parfor cannot improve on this on your local machine and in general will slow things down due to the extra communication and process overheads. See https://uk.mathworks.com/discovery/matlab-multicore.html

As for how to make this faster, you could scale up to multiple machines using MATLAB Parallel Server. However, whether or not this will speed things up depends on whether the computation time for each loop is long compared to communication overheads. Also with MATLAB Parallel Server, each worker is restricted to a single computational core so you lose the benefits of implicit multi-threading, which means you need to scale up enough to overcome this limitation.

For alternatives to improving this on your local machine you would probably need to look at algorithmic improvements, which unforunately I can't help with as I'm not a numerical algorithms expert. You could also profile the code using the MATLAB profiler to see where the bottlenecks are. However, I suspect it's unlikely that the bottlenecks will be outside the matrix computations, but you should check just in case.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Accelerating code wont work

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Accelerating code wont work

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示