フィルターのクリア

GPU arrayfun with shared arrays

1 回表示 (過去 30 日間)
Ray
Ray 2014 年 11 月 11 日
編集済み: Matt J 2014 年 11 月 14 日
Hi all,
I'm trying to speed-up some code I'm running by using the GPU functionality that comes with arrayfun.
I know arrayfun operates in an element-wise fashion however, I have a situation where I have some shared arrays involved in my function. For example, I have a function like:
f = f(a,b,A,B,C) Where a and b are (n x 1) arrays ie. the element-wise portion of the function. A, B, C are arrays that remain constant during each element-wise execution of a and b.
I've tried searching how to implement this but the results don't look too promising. Is it possible to do this using arrayfun? If not, is there another way I can speed-up such a function? I've tried utilising "par-for" but this actually turned out to be slower than a normal for-loop.
Thanks,
Ray

回答 (3 件)

Matt J
Matt J 2014 年 11 月 11 日
編集済み: Matt J 2014 年 11 月 11 日
The only hope, I think, would be to write your own CUDA kernel implemention of f(), putting A,B,C in constant memory if they are small enough to fit there. You could manage this through MATLAB using a CUDAKernel object, see
and its setConstantMemory method.

Mikhail
Mikhail 2014 年 11 月 11 日
You can try to use your function without arrayfun. If at least 1 of the arguments is on GPU, calculations will be performed on GPU.

Edric Ellis
Edric Ellis 2014 年 11 月 12 日
Can you give a more concrete example of what you'd like to do with A, B, and C? You might be able to use a nested function with up-level variables. This example is quite complex, but it shows some of the more advanced things you can do with nested functions and arrayfun. In particular, the nested function updateParentGrid accesses the up-level variable grid and indexes into it to perform the stencil computation.
  1 件のコメント
Matt J
Matt J 2014 年 11 月 14 日
編集済み: Matt J 2014 年 11 月 14 日
But can it be efficient to do this? I assume that there are CUDA threads doing each element-wise computation under the hood. If all threads need the variables A,B, and C, then surely those variables would need to be stored in constant memory in order for all threads to access them quickly enough.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by