Preconditioning for iterative solvers on GPU - Performance issues

10 ビュー (過去 30 日間)
Paulo Ribeiro
Paulo Ribeiro 2019 年 11 月 14 日
コメント済み: Joss Knight 2019 年 11 月 25 日
Dear all,
I'm experimenting some preconditioners for iterative solvers on GPU in a linear system [A]{x}={B}. The problem is defined by this simple command line:
sol=pcg(A_gpu,B_gpu,tol,maxit,P)
where A and B are gpuArrays and P is the preconditioner.
Some simple tests point out that the solution is faster than any iterative CPU solver, whenever P=[ ], with speedups up to 12x;
However, what I still can't figure out, is the reason why the performance drops whenever any type of preconditioner is selected. For an instance, using Incomplete Cholesky factorization:
L=ichol(A)
sol=pcg(A_gpu,B_gpu,tol,maxit,L*L')
Blows out the performance when compared to no preconditioner at all on the GPU. The solution is even slower than the CPU version, where this same preconditioner improves the CPU performance by 1.5x. That's really strange.
I've also tried passing A_gpu as preconditioner, but the solution takes forever:
sol=pcg(A_gpu,B_gpu,tol,maxit,A_gpu)
This issue is also related to other iterative solvers, such as: BICG and SYMMLQ
Am I doing something wrong? It appears that any preconditioner on the GPU is acting as a drawback, even when it is efficient for the CPU version.
Please share your thoughts and experiences. Thanks!
  7 件のコメント
Paulo Ribeiro
Paulo Ribeiro 2019 年 11 月 21 日
編集済み: Paulo Ribeiro 2019 年 11 月 22 日
Thanks Joss. These are really impressive results on a Titan V. It's even faster than a backslash solver A\B on the CPU with an Intel i7 8700:
tic; A\B; toc
Elapsed time is 1.712258 seconds.
For this specific case it appears that the best option is to avoid preconditioning on the GPU.
Regards.
Joss Knight
Joss Knight 2019 年 11 月 25 日
I investigated further and found that applying the preconditioner - not just decomposing it - does appear to be taking an unusually long time. This does warrant further investigation, since these two triangular solves should be fast, and your system matrix is band-diagonal. It does have quite a large bandwidth of 543 however, so that could be the issue.
Iterative solvers are always faster than direct solves for large sparse matrices (assuming they have reasonable convergence properties). Direct solves are hugely memory intensive because there is a lot of fill-in during factorization.

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeSparse Matrices についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by