MATLAB GPU: arrayfun with indexing

Hi
I am new to MATLAB GPU computing and have made some initial tests. Now I am looking to parallelize a the following code.
for i=1:n ;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(24,i);
a(currindices ) = a(currindices ) + A(24x24)*(b(currindices )+B(24x24)*c(currindices ));
end
In a test I parallelized this code without any of the indices by using arrayfun and it worked well. Meaning just having the following code in an function that was called by arrayfun:
for i=1:n
a=a+A*(b+B*c)
end
I wonder how to deal with the indexing of the vectors and whether arrayfun still makes sense. The matrices A and B are constant. I read that indexing is rather slow on a GPU.
What would be the best way to parallelize the above code?
Thanks for any help. This whole paralellization does not come natural to me yet.
BR

6 件のコメント

Walter Roberson
Walter Roberson 2017 年 10 月 22 日
編集済み: Walter Roberson 2017 年 10 月 24 日
? currindices appears to be unused before you assign to it.
Markus Ess
Markus Ess 2017 年 10 月 22 日
sorry, was a mistake. indexing should happen to currindices. fixed the code in the sample
Joss Knight
Joss Knight 2017 年 10 月 24 日
I'm not sure what language you've written your code in so it's difficult to interpret. What is A(24x24)? And if this were MATLAB code then indices(24,i) would be a scalar. But then your algebra doesn't make sense.
Markus Ess
Markus Ess 2017 年 10 月 24 日
編集済み: Walter Roberson 2017 年 10 月 24 日
it wasn't meant to be real code. it is just to show that A is of size 24x24 and that for currindices I read 24 values. so currindices is currindices(:,i) in MATLAB code and the multiplication with A and B is simply that.
for i=1:n %;where n~1'000'000 and a, b,c of size ~300'000x1
currindices = indices(:,i);
a(currindices ) = a(currindices ) + A*(b(currindices )+B*c(currindices ));
end
well, one of the things I learnt anyway is that I have to use pagefun. the problem is still indexing.
however the main feeling i have is that anyway I have to rewrite the math for an optimal parallelization.
Joss Knight
Joss Knight 2017 年 10 月 26 日
I don't think you need pagefun. Can't you just do this with indexing and matrix multiplication? It seems indices is the correct shape, namely 24-by-n. So b(indices) and c(indices) return 24-by-n, the multiplies return 24-by-n, and the addition works.
a(indices) = a(indices) + A * (b(indices) + B * c(indices));
If the indices repeat this may not work as you intended, because some elements of a will get one of the answers and not another. You might have to use accumarray in that case.
result = a(indices) + A * (b(indices) + B * c(indices));
a = accumarray(result, indices(:), size(a));
Markus Ess
Markus Ess 2017 年 10 月 31 日
got it. at least on CPU the multiplication is 10 times faster than the for loop. anyway I know need to rewrite the code and see how that could work on a GPU.
thanks!

サインインしてコメントする。

回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeGPU Computing についてさらに検索

質問済み:

2017 年 10 月 22 日

コメント済み:

2017 年 10 月 31 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by