# Slow computation time of parfor loop

1 回表示 (過去 30 日間)
기옥 김 2022 年 9 月 7 日

Hello,
I need help to optimize the following parallel loop
parfor k=1:N
[Laux{k}, Uaux{k}, Paux{k}, Qaux{k}] = lu(Jtot{k})
end
The computation time of the above loop takes
Elapsed time is 3.569814 seconds.
Jtot contains sparse matrix of ~40k x 40k size in each cell.
I simply tried the following code
Jtot2=Jtot{1}
parfor k=1:N
[Laux, Uaux, Paux, Qaux] = lu(Jtot2)
end
Elapsed time is 0.749602 seconds.
,and then i also tried this one
Jtot2=Jtot{1}
parfor k=1:N
[Laux{k}, Uaux{k}, Paux{k}, Qaux{k}] = lu(Jtot2)
end
Elapsed time is 2.593602 seconds.
It seems like large size of Jtot, and resulted LU decompositions brings the issue.
I've also tried spmd but it was still slow.
spmd(N)
[Laux, Uaux, Paux, Qaux] = lu(Jtot{labindex})
end
The sequential matrix inversion process has to be followed after the parallel loop, so the results of each cell decomposition of Jtot need to be stored.
How can i reduce the computation time? i wish to decrease it not more than ~1sec.
##### 3 件のコメント1 件の古いコメントを表示1 件の古いコメントを非表示
기옥 김 2022 年 9 月 13 日
Hello, Unfortunately, i believe it is not possible to provide whole code to run it on the othermachine because the code is too long.
for kiter=1....
parfor k=1:Num_pool
X_pp=X_local(:,k+1);
[J, res{k}] = Jc.eval2(X_pp, mat_fun);
Jtot = PT' * (J + Qconst{k}) * PT;
res_tot{k,1} = PT'*(res{k} + Qconst{k}*X_pp + FL{k});
[Laux{k}, Uaux{k}, Paux{k}, Qaux{k}] = lu(Jtot);
end
dX{1} = PT*(Qaux{1} * (Uaux{1} \ (Laux{1} \ (-Paux{1}*res_tot{1}))));
x_op=1% temporary
for k=2:Num_pool
dX{k}=PT*(Qaux{k} * (Uaux{k} \ (Laux{k} \ (Paux{k}*(-res_tot{k} + x_op*PT'*Mtot*dX{k-1}) ))));
X_local(:,k+1)= X_local(:,k+1)+x_op*dX{k};
end
end
I've tried with different approach and the above code is one of it.
This is code for finite element analysis. I'm trying to make parallelize loop.
X_pp is the unknonwn vector to be solved of which size is (N_pp,1)
Jc.eval() is the function to evaluate the jacobian matrix, J.
J is the sparse matrix (Jacobian) of (N_pp,N_pp). N_pp is around 40000.
The variables res, res_tot, and the results of LU decomposition is called after parloop so that i need to stored it as a cell.
As it can be seen, it solve X_local(:,[k Num_pool]) in parallel, where
k denotes for the time step. without this loop X_local is solved for step by step with increasing k. In that case,
mldive can be used instead of LU decomposition..
This parallel code is much more slower than i expected..
Alvaro 2023 年 1 月 26 日
How long does this take to run in serial? At the moment it is not clear why you need a faster computing time than 1 second per parfor loop.

サインインしてコメントする。

### 回答 (1 件)

Alvaro 2023 年 1 月 26 日
If you wish to parallelize, lu already has built-in support for running in thread-based environments.
Alternatively, you could consider slicing your matrix or working with distributed arrays.
Consider also the thresh parameter in lu which might decrease calculation time at the expense of accuracy.

サインインしてコメントする。

### カテゴリ

Help Center および File ExchangeParallel Computing Fundamentals についてさらに検索

R2022a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by