Perfomance Loss of Matrix-Vector Multilplication on GPU with Array Indexing

2 ビュー (過去 30 日間)
Afshin Ahmadi
Afshin Ahmadi 2020 年 4 月 29 日
コメント済み: Afshin Ahmadi 2020 年 5 月 4 日
Hi,
I have a large matrix A and a vector B. I want to do a partial multiplication on GPU using array indexing but the peformance is much lower than doing a full A*B. Below is a simple example of what I am trying to do:
A = rand(20000,'gpuArray');
B = rand(20000,1,'gpuArray');
C = A(8001:18000,1:end)*B;
GPU Device: Tesla V100
MATLAB 2020a
Any suggestion on how to improve the performance? Thank you.

採用された回答

Edric Ellis
Edric Ellis 2020 年 4 月 30 日
Unfortunately, the expression A(8001:18000,:) requires a strided memory copy. Matrices in MATLAB (even on the GPU) are stored in column-major format, so picking out only certain rows is much less efficient than picking out only certain columns.
There's a trick you can use though that takes advantage of the fact that gpuArray matrix multiplication is optimised for the transposed-times case. Try instead pre-transposing A (this is relatively expensive, but perhaps you can do it only once) and then doing:
A(:, 8001:18000).' * B;
This uses the much-faster indexing pattern, and is about ~2x faster on my GPU.
  5 件のコメント
Edric Ellis
Edric Ellis 2020 年 5 月 4 日
Strange, I just tried on a WIN64 machine here with a V100, and got the following result:
t1 =
1.6677e-04
t2 =
4.4944e-04
(This was using R2020a).
Afshin Ahmadi
Afshin Ahmadi 2020 年 5 月 4 日
I tried again and it seems your solution is quite fast when the block size is small, which is exactly what I need. Thank you so much for the help! I will just include some information here for the people who are interested in doing the same thing.
A = gpuArray.rand(20000);
B = gpuArray.rand(20000,1);
At = A.';
t1 = gputimeit(@() At(:,500:2000).'*B)
t2 = gputimeit(@() At(:,500:5000).'*B)
t3 = gputimeit(@() At(:,500:10000).'*B)
t4 = gputimeit(@() A(500:2000,:)*B)
t5 = gputimeit(@() A(500:5000,:)*B)
t6 = gputimeit(@() A(500:10000,:)*B)
t7 = gputimeit(@() A*B)
Execution time:
t1 = 4.4423e-04
t2 = 0.0010
t3 = 0.0020
t4 = 0.0035
t5 = 0.0051
t6 = 0.0076
t7 = 0.0044
(MATLAB R2020a, Tesla V100, Linux)

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeProgramming についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by