Running Code on GPU Seems much Slower than Doing so on CPU
古いコメントを表示
Hi there,
I am using a Thinkpad W550, and my GPU is Quadro K620M. As I simply ran the following code, the profile showed that running on the GPU was much slower.
function Test_GPU()
a = [10^8, 18^8];
h = a;
c = conv2(h, a, 'full');
% Running in doube precision got a similar result
aa = single(gpuArray([10^8, 18^8]));
hh = aa;
cc = conv2(hh, aa, 'full');
end

So I ran the official gpuBench()
The result is astonishing! Running on the GPU IS slower, much much more slower.
The first picture shows the result from GPU, and the second, CPU.


I will be very grateful if anyone could tell me why. Many thanks
2 件のコメント
Theron FARRELL
2019 年 5 月 27 日
Jan
2019 年 5 月 27 日
a = [10^8, 18^8] is a [1x2] vector. For a speed comparison, this job is too tiny.
採用された回答
その他の回答 (2 件)
Walter Roberson
2019 年 5 月 27 日
0 投票
The Quadro 620M was a Maxwell architecture, GM108 chip. That architecture does double precision at 1/32 of single precision.
MTimes operations are delegated to LAPACK by MATLAB for sufficiently large arrays. LAPACK automatically uses all available CPU cores.
My CPU shows up as faster for double precision MTIMES and backslach than my GTX 780M does, but the GPU was much faster for single precision, and is faster for double precision FFT than my CPU measures as.
8 件のコメント
Jan
2019 年 5 月 27 日
The screenshot posted by te OP seems to show, that his GPU works slightly faster on double than on single. Strange.
Theron FARRELL
2019 年 5 月 27 日
編集済み: Theron FARRELL
2019 年 5 月 27 日
Andrea Picciau
2019 年 5 月 28 日
編集済み: Walter Roberson
2019 年 5 月 29 日
Hi Theron,
I ran your code on my workstation, on which I have an NVIDIA K40c and an Intel Xeon E-1650 CPU. I wasn't able to reproduce your results, which seems to suggests that the your GPU might be the "limiting factor".
What version of MATLAB are you using?
Jan
2019 年 5 月 28 日
@Andrea: This is not my code.
@Theron FARRELL: Using the profiler disables the JIT acceleration. The comparison of timings, which are displayed as "0.000s" is very fragile. You cannot expect to get a realistic view on the efficiency of the code with such comparisons.
"And now, it seems that 'single' is the fastest. So strange...." - I still think, that this is the expected effect. If you observe anything else, there is eitehr a problem in the code, or the transfer of the data to the GPU exceeds the time of the actual processing, or the total times are too short to be measured relaibaly by the profiler. Using some hundred calls in a loop and tic/toc is more accurate, but timeit is even better.
Theron FARRELL
2019 年 5 月 29 日
Theron FARRELL
2019 年 5 月 29 日
移動済み: Walter Roberson
2024 年 10 月 27 日
Andrea Picciau
2019 年 5 月 29 日
編集済み: Walter Roberson
2019 年 5 月 29 日
@Jan: Sorry, I meant to say "Theron". I changed my previous comment to fix that.
Jan
2019 年 5 月 29 日
@Theron: I do not undestand, why you expect arrayfun to have a positive effect on the processing speed. The opposite is expected.
Starting the profiler disables the JIT accleration automatically, because the JIT can re-oreder the commands if it improves the speed, but then there is no relation between the timings and te code lines anymore. This means, that running the profiler can affect the run time massively, especially for loops. Of course this sounds to be counter-productive for the job of a profiler - and it is so, in fact. Therefore the profiler and tic/toc should be used both, because they have different advantages and disadvantages. For measuring the speed of single commands or elementary loops, the profiler is not a good choice.
Miguel
2024 年 10 月 27 日
0 投票
I am running a vehicle simulation on GPU vs CPU, and takes hughe ammount of time, and I have a gaming PC, why?
カテゴリ
ヘルプ センター および File Exchange で Get Started with GPU Coder についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



