Rules of thumb on GPU usage?

Question

David Short 2015 年 9 月 21 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/244313-rules-of-thumb-on-gpu-usage

編集済み: Joss Knight 2015 年 10 月 14 日

I've converted several algorithms to using a gpu and I've always seen a tremendous improvement in execution time. For the first time, this particular trick has failed me. I'm seeing execution time lengthen when I use the GPU.

Is there a better way to determine the performance of a code snippet in a gpu than to alter the code and try it out?

Specifically when the target code may be executed on different classes of gpu's are there rules of thumb to predict the improvement/degradation that will result?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Joss Knight 2015 年 10 月 12 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/244313-rules-of-thumb-on-gpu-usage#answer_195693

You ought to provide some examples so that we know the kind of thing you're getting at.

The main rule of thumb is that the GPU will generally perform well when your code is highly data-parallel. If you get a speed-up from vectorizing your code, you'll probably get a speed-up on the GPU. This means the same sort of operations are taking place in multiple places on a large dataset. If however, you have small pieces of data, a lot of disparate tasks, dependent operations, and loops, you probably don't have something that will parallelize well.

Have a look at this page of MATLAB documentation and this blog article.

2 件のコメント
なしを表示なしを非表示

David Short 2015 年 10 月 13 日

Thank you Joss, With this particular problem which we can think of as a large linear algebra problem with conditionals at the end, I did see improvement through vectorizing the code, but the code relies on logical indexing to reduce the space of the computation. In practice with the GPU that meant huge data transfers that ate any times gained by going to the GPU for calculation.

I may be able to restructure the problem so that other than logical indexes the data transfers much less often, but I have not got there just yet.

I'll update when I have more.

Joss Knight 2015 年 10 月 14 日

編集済み: Joss Knight 2015 年 10 月 14 日

gpuArray supports logical indexing so I see no reason why you would need any data transfers (see the blog article I linked above for examples). Can you explain?

サインインしてコメントする。

Rules of thumb on GPU usage?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Rules of thumb on GPU usage?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示