Iterative solver with gpuArray

Question

1 投票

Hi all,

In some cases the use of iterative solvers is useful also with full matrices, which is my case. I would like to use an iterative solver like GMRES with full matrices where the matrix and the RHS are gpuArrays, but it looks like this is not provided with Matlab 2013a.

My data are

 >> n = 1024; 
 >> Acpu = rand(n)+100*eye(n);
 >> bcpu = rand(n,1); 
 >> Agpu = gpuArray(Acpu); bgpu = gpuArray(bcpu);

I tried either

 >> x = gmres(Agpu,bgpu,[]);
 Error using iterchk (line 39)
 Argument must be a floating point matrix or a function handle.
 Error in gmres (line 86)
 [atype,afun,afcnstr] = iterchk(A);

and

 >> x = gmres(@(x)(Agpu*x),bgpu,[]);
 The following error occurred converting from gpuArray to double:
 Conversion to double from gpuArray is not possible
 Error in gmres (line 297)
     U(:,1) = u;

The only way I found to make it work is

 >> x = gmres(@(x)gather(Agpu*x),bcpu,[]);
 gmres converged at iteration 7 to a solution with relative residual 2.4e-07.

That is terribly ugly because the matrix-vector-product is continuously swapped from GPU to the system memory. Any suggestion to use GMRES on GPU using MATLAB built-in functions?

Thanks in advance Fabio

2 件のコメント
なしを表示なしを非表示

Matt J 2014 年 9 月 16 日

Are you saying you get no acceleration over CPU-gmres? I wouldn't expect the data transfer of Agpu*x to be such a big penalty. It's not like you're transfering all of Agpu, after all.

I also vaguely wonder whether this would continue to be a problem on newer graphics cards and newer versions of CUDA. My understanding was that the newer CUDA versions could share memory with the CPU.

Fabio Freschi 2014 年 9 月 16 日

Not yet implemented in Matlab 2013a. I get out-of-memory pretty soon if I exceed the GPU memory (12GB in my case, with Tesla K40)

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Matt J 2014 年 9 月 16 日

編集済み: Matt J 2014 年 9 月 16 日

MATLAB Online で開く

1 投票

Even for much larger problem sizes (n=10240) and a not so new graphics card (GTX 580), I see negligible overhead in time to swap between CPU and GPU,

   n = 1024*10; 
    Acpu = rand(n)+100*eye(n);
    bcpu = rand(n,1); 
    Agpu = gpuArray(Acpu); 
    bgpu= gpuArray(bcpu);
    gputimeit(@() Agpu*bgpu) %all data on gpu
    %0.0052sec
    gputimeit(@() gather( Agpu*bcpu )) %requires data transfer
    %0.0054sec

Speed-up in GMRES also seems pretty good (factor of 4)

   tic;
    x = gmres(@(x) Acpu*x,bcpu,[]);
   toc
   %Elapsed time is 0.391786 seconds.
   tic;
    x = gmres(@(x)gather(Agpu*x),bcpu,[]);
   toc
   %Elapsed time is 0.097924 seconds.

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

Matt J 2014 年 9 月 16 日

編集済み: Matt J 2014 年 9 月 16 日

MATLAB Online で開く

If you must use tic...toc, the following would be a better set of tests

   tic; 
    x=gather( Agpu*bcpu );x(:)=1;
   toc %requires data transfer
   tic; for ii=1:10, 
         x= Agpu*bgpu;
        end;
        x=gather(x);
        x(:)=1;
   toc/10 %all data on gpu
   tic; x= Acpu*bcpu;x(:)=1; toc

Notice that the second test is the most realistic representation of what you would like to do, i.e., many iterations of GPU operations plus a final gather() operation at the end of the iterations.

Fabio Freschi 2014 年 9 月 16 日

編集済み: Fabio Freschi 2014 年 9 月 16 日

MATLAB Online で開く

Following a suggestion found in the Mathworks website:

 >> gd = gpuDevice;
 >> tic; for i = 1:100, x = Agpu*bgpu; end; wait(gd); toc
 Elapsed time is 0.537721 seconds.
 >> tic; for i = 1:100, x = gather(Agpu*bgpu); end; wait(gd); toc
 Elapsed time is 0.547418 seconds.

That are in accordance with your experiments

EDIT: I see now your comment that is similar with this implementation

サインインしてコメントする。

Answer 2

Joss Knight 2015 年 9 月 7 日

0 投票

If you download the R2015b release of MATLAB (released on 3rd September) you will find that gmres is now supported for sparse gpuArrays, including support for a single sparse matrix preconditioner. See http://www.mathworks.com/help/distcomp/release-notes.html.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Iterative solver with gpuArray

2 件のコメント
なしを表示なしを非表示

採用された回答

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

Iterative solver with gpuArray

2 件のコメント なしを表示 なしを非表示

採用された回答

5 件のコメント 3 件の古いコメントを表示 3 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

5 件のコメント
3 件の古いコメントを表示 3 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示