Persist data on GPU between parfor Calls

Question

Jonathan 2015 年 4 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/195871-persist-data-on-gpu-between-parfor-calls

コメント済み: Edric Ellis 2015 年 4 月 8 日

The actual overhead from invoking a parfor is pretty low (~17 ms), so it is fast enough to start up a parfor, do one operation per worker, and then repeat:

for i=1:N
    parfor j=1:10
        X{j} = gpuArray(X{j});
        Y{j} = MyFunction(X{j}); % <-- Takes about 1 second per worker
    end
end

However, it seems that Matlab re-copies all of the data in X{j} over to the GPU each iteration of the for loop. I would like X{j} to persist on the GPU between for loop iterations.

One hacky solution is to embed another for loop inside the parfor to reduce the amount of re-copying. This is not ideal for my application (I'm doing gradient descent function optimization).

Hopefully there is a simple way to force each X{j} to remain on its respective GPU.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Edric Ellis 2015 年 4 月 7 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/195871-persist-data-on-gpu-between-parfor-calls#answer_174246

MATLAB Online で開く

Normally I would recommend the Worker Object Wrapper to do this - however, that's more useful for parfor "broadcast" variables - in this case, it looks like you're trying to persist "sliced" variables, which is tricky since each worker can end up with a different sliced variable each time. In this case, can't you simply switch the for loop to be inside the parfor loop? (Perhaps your actual code is a little more complicated and precludes that).

The other option is to use spmd and Composite objects (which behave somewhat like "WorkerObjWrapper" objects, but are specific to spmd). You might do something like this:

spmd
  myX = gpuArray(labindex);
end
% myX is now a "Composite" that persists on the workers
for i = 1:N
  spmd
    myY = MyFunction(myX);
  end
end

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Jonathan 2015 年 4 月 7 日

Thanks, sounds very promising. This is pretty much what I had in mind! I will give it a try and let you know if I encounter issues.

サインインしてコメントする。

Answer 2

Jonathan 2015 年 4 月 7 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/195871-persist-data-on-gpu-between-parfor-calls#answer_174374

This turns out to be by far the most efficient way to use multiple GPUs. The secret is to put the loop inside the spmd block. Then you can use commands like gplus(), labSend() and labBarrier() to synchronously communicate between workers and accumulate results.

Doing it this way, the parallel overhead I experience is very minimal (maybe 10-15% at most).

Thanks!

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Edric Ellis 2015 年 4 月 8 日

Yes, the spmd block has overhead which you can minimise by moving loops inside. Note that while labSend and labReceive work correctly with gpuArray data, they are not optimised for that (in the sense of some of the more advanced technologies that NVIDIA have for sending GPU data directly via MPI) - effectively the data has to be gathered back to the CPU before it can be sent.

サインインしてコメントする。

Persist data on GPU between parfor Calls

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Persist data on GPU between parfor Calls

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示