Why for loop in CPU faster than in GPU

I am running the following command:
p = zeros(7000000,1); g_p = gpuArray(p);
1. CPU version
for i=1:size(p,1)
p(i) = 10; % just simple to see
end
2. GPU version
for i=1:size(g_p,1)
g_p(i) = 10; % just simple to see
end
It is amazing that the execution time in CPU really more faster than it in GPU. Why?

4 件のコメント

Adam
Adam 2017 年 6 月 21 日
Data has to be copied to the GPU. This has an overhead so you use the GPU for massively parallel intensive computations rather than things that are so fast on the CPU that the overhead of copying to the GPU is greater than the saving made.
Jan
Jan 2017 年 6 月 21 日
@Khieu: Compare this:
tic; p(:) = 10; toc
tic; g_p(:) = 10; toc
What do you get?
Khieu
Khieu 2017 年 6 月 21 日
Thanks Adam
Adam
Adam 2017 年 6 月 21 日
It is very difficult often to determine where this crossover is. I have occasionally dabbled with a GPU implementation of some computation in my code, but it is always hit and miss whether it is faster or slower on a particular run since data sizes vary so I often just end up commenting it out with a vague idea I might come back to it one day and consider some fancy code to chose the CPU or GPU version of my code based on input size and any other appropriate metrics, but I never have done so far.

サインインしてコメントする。

回答 (1 件)

Joss Knight
Joss Knight 2017 年 6 月 22 日

0 投票

This isn't amazing at all. You are launching 7 million kernels, each to copy one value into GPU memory. The kernels cannot overlap because they are operating on the same data, which means each must wait for the last to finish. This is not at all an efficient way to do parallel programming!

カテゴリ

ヘルプ センター および File ExchangeGPU Computing についてさらに検索

製品

タグ

質問済み:

2017 年 6 月 21 日

回答済み:

2017 年 6 月 22 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by