How do I optimize this code to run efficiently on the GPU?

2 ビュー (過去 30 日間)
Jan
Jan 2013 年 11 月 27 日
編集済み: Joss Knight 2013 年 11 月 28 日
Dear Matlab user,
I try making optimization of function listed below for GPU computing. I try many version of GPU algorithm but look for me that always is GPU slower. I really appreciate any suggestion or help.
%%Declaration of variables
K=4;
C11n = rand(K,508032);
[x1, x2] = size(C11n);
C22b = zeros(x1,x1*(x2/2),'double');
C2 = zeros(K,K,x2/2,'double');
E=eye(x1);
A=reshape(C11n,K,2,x2/2);
AT=permute(A,[2 1 3]);
%%CPU code
tic
for k=1:x2/2
C2(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
%%GPU code
% Declaration of variables
C22 = gpuArray(zeros(K,K,x2/2,'double'));
E=gpuArray(eye(x1));
A=gpuArray(reshape(C11n,K,2,x2/2));
tic
for k=1:x2/2
C22(:,:,k)=E-(A(:,:,k)*inv(AT(:,:,k)*A(:,:,k))*AT(:,:,k));
end
toc
with best regards
Jan

回答 (2 件)

Ashish Uthama
Ashish Uthama 2013 年 11 月 27 日
A quick 'air' code using pagefun:
tic
M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2));
M = pagefun(@mtimes, M, M);
C22 = repmat(E,[1 1 x2/2])-M;
toc
I would be curious to know if this works for you, and what times you get on your hardware.
  1 件のコメント
Jan
Jan 2013 年 11 月 27 日
編集済み: Jan 2013 年 11 月 27 日
Thank you for idea, I will try and let you know...
I apologise, but first time I wrote bad code, I forgot for inversion of matrix, now is code corrected.
BTW, pagefun, help me, it is 10x times speed up (M = pagefun(@mtimes, A(:,:,1:x2/2), AT(:,:,1:x2/2)); ). Now I need figure out how do it quick inversion on every page of 3D matrix. I will inform you.

サインインしてコメントする。


Joss Knight
Joss Knight 2013 年 11 月 28 日
編集済み: Joss Knight 2013 年 11 月 28 日
Are your matrices always 4x2? This results in AT*A being 2x2, so you can just calculate your inverses manually:
function Ainv = batch2x2inv(A)
% Grab each matrix element as a vector
a = A(1,1,:);
b = A(1,2,:);
c = A(2,1,:);
d = A(2,2,:);
% Compute determinants
det = a.*d - b.*c;
% Construct inverse
Ainv = bsxfun(@rdivide, [d -b; -c a], det);
end
...and the relevant chunk of your code also uses pagefun as Ashish suggests:
AT = pagefun(@transpose, A);
ATA = pagefun(@mtimes, AT, A);
invATA = batch2x2inv(ATA);
pinvA = pagefun(@mtimes, invATA, AT);
residual = pagefun(@mtimes, A, pinvA);
C22 = bsxfun(@minus, E, residual);
Your code now runs 6x faster than the CPU on my machine.

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by