Optimize GPU code with nested pagemtimes

Question

MATLAB Online で開く

0 投票

Hello all,

I'm trying to speed up computation using the GPUs that are available to me. Right now I have two arrays, Q and W.

size(W) = (16 1 1000)

size(Q) = (16 16 1 2000)

I want to do a sudo-matrix multiplication M = W ' *Q*W to get size(M) = (1000 2000).

To do this I use two instances of pagemtimes which is able to utilize GPU. Here's the code

%%
tic
Sar_pm_gpu = zeros(num_psar_kept,2,size(shim_pm_gpu,3),'single','gpuArray');
for n =1:size(W,3)
    inter_calc = pagemtimes(Q_gpu,shim_pm_gpu(:,1,n));
    Sar_this_shim = squeeze(pagemtimes(shim_pm_gpu_left(:,:,n),inter_calc)); %in a test, this one is ~15% faster
    [Sar_maxk, index_maxk] = max(Sar_this_shim);
    Sar_pm_gpu(:,:,n)=[Sar_maxk,index_maxk];
end

With this code I get ~5x speedup vs running it on the cpu. However I'd expect it to be quite a bit faster than that. I then used nvidia-smi and the power consumption on the GPU was ~35W. For referance the resting power consumption is 30W so I don't think that this code is actually utilizing the GPU. If anyone sees a way to speed this up it would be much appriciated! (a explaination on why the GPU power consumption is so low with this posted code would also be much appriciated, I assume it has something to do with memory)

2 件のコメント
なしを表示なしを非表示

Matt J 2022 年 7 月 28 日

You shouldn't be using tic/toc for timing gpuArray operations,

https://www.mathworks.com/help/parallel-computing/gputimeit.html#bt2tfto-1

tiwwexx 2022 年 7 月 29 日

MATLAB Online で開く

I clipped off the end of the code on accident, I make sure to

gather(output)

before calling toc so it's accurate.

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Matt J 2022 年 7 月 28 日

MATLAB Online で開く

1 投票

I don't think you need either a loop or a second pagemtimes call.

Wr=reshape(W,16,1000);
Qr=reshape(Q,16,16,2000);
M=sum(pagemtimes(Qr,Wr).*Wr,1);
M=reshape(M,1000,2000);

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

Matt J 2022 年 7 月 28 日

編集済み: Matt J 2022 年 7 月 28 日

It seems to be slower only on the GPU. pagemtimes isn't well-optimized for the GPU, it would appear.

tiwwexx 2022 年 7 月 28 日

Hmm, very interesting indeed. I have a feeling that I'm eventually going to need to learn CUDA since I run into these problems quite often...

サインインしてコメントする。

Optimize GPU code with nested pagemtimes

2 件のコメント
なしを表示なしを非表示

採用された回答

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

リリース

タグ

Community Treasure Hunt

Optimize GPU code with nested pagemtimes

2 件のコメント なしを表示 なしを非表示

採用された回答

4 件のコメント 2 件の古いコメントを表示 2 件の古いコメントを非表示

その他の回答 (0 件)

カテゴリ

製品

リリース

タグ

参考

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

4 件のコメント
2 件の古いコメントを表示 2 件の古いコメントを非表示