GPU time slower than CPU time in Mandrelbolt set example?

Question

Dang Manh Truong 2017 年 1 月 28 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/322273-gpu-time-slower-than-cpu-time-in-mandrelbolt-set-example

コメント済み: Walter Roberson 2017 年 1 月 30 日

Hi, I'm following the Mandrelbolt set example featured on Mathworks's blog: http://blogs.mathworks.com/loren/2011/07/18/a-mandelbrot-set-on-the-gpu/ I'm using Windows 10, 16GB of RAM, and my GPU information:

>> gpuDevice
ans = 
    CUDADevice with properties:
                        Name: 'Quadro M1000M'
                       Index: 1
           ComputeCapability: '5.0'
              SupportsDouble: 1
               DriverVersion: 8
              ToolkitVersion: 7.5000
          MaxThreadsPerBlock: 1024
            MaxShmemPerBlock: 49152
          MaxThreadBlockSize: [1024 1024 64]
                 MaxGridSize: [2.1475e+09 65535 65535]
                   SIMDWidth: 32
                 TotalMemory: 2.1475e+09
             AvailableMemory: 1.5948e+09
         MultiprocessorCount: 4
                ClockRateKHz: 1071500
                 ComputeMode: 'Default'
        GPUOverlapsTransfers: 1
      KernelExecutionTimeout: 1
            CanMapHostMemory: 1
             DeviceSupported: 1
              DeviceSelected: 1
Here are the results:

The thing is, the time it took with GPU is much longer than simply using CPU (arrayfun is fine). Why is it? Please help me, thank you very much :)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Joss Knight 2017 年 1 月 29 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/322273-gpu-time-slower-than-cpu-time-in-mandrelbolt-set-example#answer_252490

Your Quadro GPU is not intended for intensive double precision computation (I can't find published figures, but it's going to be something like 50 gigaflops as opposed to 5 teraflops for a proper compute GPU). Try converting the example to single precision. It will probably be about 30 times faster.

See e.g. https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_processing_units#Quadro_Mxxx_series

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

Joss Knight 2017 年 1 月 30 日

You can't put mobile GPU chips into TCC mode as far as I'm aware. The basic issue is that you're trying to do high performance computing on a laptop.

Walter Roberson 2017 年 1 月 30 日

Okay, further research says that the M1000M is Maxwell architecture GM107 series, and that the double precision performance is 1/32 of the single precision performance.

サインインしてコメントする。

Answer 2

Walter Roberson 2017 年 1 月 28 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/322273-gpu-time-slower-than-cpu-time-in-mandrelbolt-set-example#answer_252346

This is not uncommon. There is communication overhead with the GPU. It is most effective if you have extensive GPU computation with little data transfer (which does not necessarily mean small matrices being computed with.) In cases where you do a little bit of computing on large matrices being transferred then although the computations might be very fast you have to wait for the results to data transfer in both directions. If you are going to do further computation on data then leave a copy of it on the GPU even if you want a CPU copy, so that you do not need to transfer it up to the GPU again .

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Dang Manh Truong 2017 年 1 月 28 日

But there was no data transfer from the CPU to the GPU, because it was created directly on the GPU :( Can you explain this phenomenon? :(

サインインしてコメントする。

GPU time slower than CPU time in Mandrelbolt set example?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

GPU time slower than CPU time in Mandrelbolt set example?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

7 件のコメント 5 件の古いコメントを表示5 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

7 件のコメント
5 件の古いコメントを表示5 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示