parallel.gpu.CUDAKernel slow on GTX 1080
古いコメントを表示
I executed this matlab command to load a cuda kernel.
KNNSearchGPU = parallel.gpu.CUDAKernel('Search.ptx','Search.cu');
It took about a minute on a computer with GTX 1080 but less than a sec on one with GTX TITAN. Both of them have cuda 8.0rc installed on ubuntu 14.04.
Even for an empty function like this in Search.cu.
__global__ void Search( float * result, const int * args, const float * pc1, const float * pc2)
{
}
I've notice the problem that matlab may not yet support this new card from this discussion. http://www.mathworks.com/matlabcentral/answers/79275-gpudevice-command-very-slow
If that's the case, when will matlab support GTX 1080? Will it be in 2016b?
1 件のコメント
Joss Knight
2016 年 6 月 15 日
You need to use the toolkit supported by MATLAB, namely CUDA 7.5. If you still see the problem on your GTX 1080 then can you
- Let us know what commands you are executing on the command line to compile your PTX code.
- Let us know whether the performance problems occur every time you load the kernel or just once; and whether running another GPU function first (e.g. gpuDevice) resolves the performance problem.
採用された回答
その他の回答 (3 件)
Bosco Tjan
2016 年 9 月 5 日
1 投票
Thank you, Ritesh for your timely answer! We installed a Titan X (Pascal) board and are experiencing the same issue. A follow-up question: by a "one-time compilation", do you mean one-time per matlab session? When I exit and restart matlab, the same slowdown reoccurs. Is there anyway to make the compiled code persistent across sessions?
8 件のコメント
Walter Roberson
2016 年 9 月 6 日
It should not be once per session, it should be once per install of MATLAB.
I also have a 1080 and gpuDevice is slow EVERY time you call it. The "one-time compilation" advantage only seems to apply to other functions, and only applies per session. Really, this is the only reason the 1080 (and presumably other pascal cards) is usable at all. Thankfully I still have my old Maxwell Titan X which I can prototype on... I only use the 1080 in a parfor loop for the real number crunching, where the first-time (again, per session) startup cost to compile the arrayfun "kernels" on it is much less than the total compute time. Thus, for me, the only benefit to the "one-time compilation" is that subsequent runs of my parfor loop start much more quickly on the 1080.
D. Plotnick
2016 年 9 月 9 日
編集済み: D. Plotnick
2016 年 9 月 9 日
I too see a "once-per-session" issue. It's also a "once-per-command" issue. So far "gpuDevice", "gpuArray", and "gather" each require an individual multi-minute compilation period. Right now I run a script at the beginning of the session that executes "gather" and "gpuArray" to save time; however, gpuDevice is always slow, so I have to be super careful about not running out of memory. Once initialized, all functions on the Titan are blazingly fast.
I really hope we get some support for the new Pascal chips, been waiting a long time for good double-precision cards to be available again.
I found a solution on another thread. By making the CUDA cache larger, the one-time compilation is more persistent as I assume it doesn't need to be cleared out as regularly. You can change its size (I made it 1 GB) by entering the "CUDA_CACHE_MAXSIZE" variable into your (windows) environment variables and setting the value to the cachesize (in bytes). After making this change I no longer get the multi-minute compilations for gpuDevice and other gpu functions for my 1080s.
Walter Roberson
2016 年 9 月 10 日
Good find, Nick Chng .
D. Plotnick
2016 年 9 月 12 日
Just tried this on Titan X, and can confirm it works. First time I ran it still required ~2 mins, but its now down to 2 seconds even on new Matlab sessions. Excellent find Nick Chng, and do you have a link to the original thread so I can thank the original author as well?
Shawn Healey
2016 年 9 月 14 日
Confirmed on system with Titan X and 780.
Nick Chng
2016 年 9 月 17 日
I found it at the second one of the threads you linked.. the parallel for all blog. Glad it's working, cheers everyone.
Wajahat Kazmi
2016 年 11 月 2 日
編集済み: Wajahat Kazmi
2016 年 11 月 2 日
1 投票
Hi
I had the same problem with GTX 1080 wih Matlab R2016a and b. However, when I used CUDA 8.0 with Matlab 2014b, the problem was solved (Windows 7 and 10).
Best Regards Wajahat
Alexander K
2016 年 12 月 6 日
0 投票
Dear colleges and MathWorks professionals,
I have almost the same problem with very long loadings (probably JIT re-compilations) in every new session of Matlab and even occasional crashes when trying to execute the command to reset gpu.
My configuration: - GTX 1070 (Pascal) on corei7 6700, 64GB RAM; - Win 10 Pro, Matlab 2016_b_ and CUDA 8.0 (installed very recently from Nvidia site; after the installation of the Matlab).
Many thanks for the above discussion and advices including the above-mentioned "pair of threads" which are also very informative!
My question is: what if variables CUDA_CACHE_MAXSIZE and CUDA_CACHE_DISABLE does NOT seem to exist in the registry on my workstation (Win 10) ???
How should I find or create them correctly ?
Regedit does NOT find them at all! (Although, the following sections: HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\GPU Computing Toolkit\CUDA\v8.0 do exist).
Many thanks to all of you in advance!
Alexander K, PhD.
3 件のコメント
Nick Chng
2016 年 12 月 11 日
Hi Alexander,
In Win10, edit the system "environment variables" (you can google how to access this) and add the variables and values there. Note that this isn't the same as editing the registry.
Cheers, Nick
Alexander K
2017 年 2 月 8 日
Many thanks for your helpful answer!
yingkun yang
2019 年 4 月 3 日
Excuse me ,Alexander.
My question is: How to set the CUDA cache by setting the environment variable (Win 10) ?
I create a System variables named CUDA_CACHE_MAXSIZE and set the value to 536870912.
But I think I'm wrong!
Many thanks to you in advance!
カテゴリ
ヘルプ センター および File Exchange で Matrix Indexing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!