gpuDevice command very slow
古いコメントを表示
I am running CUDA kernels using the parallel computing toolbox and r2012a. Recently upgraded to a 600 series (Kepler) gpu. To setup the CUDA kernel we extract the maximum threads per block using: gpu_han=gpuDevice(1); k = parallel.gpu.CUDAKernel('gpu_tfm_linear_arb.ptx', gpu_tfm_linear_arb.cu'); k.ThreadBlockSize = gpu_han.MaxThreadsPerBlock;
This is now executing very slowly (order 2mins). If I specify the threadblocksize manually to the max of the card (1024 in this case), it executes in 0.1 s.
This used to run quickly with a 400 series card. Any help gratefully received
採用された回答
その他の回答 (2 件)
Andrei Pokrovsky
2016 年 9 月 15 日
編集済み: Andrei Pokrovsky
2016 年 9 月 15 日
3 投票
Try setting these env vars:
export CUDA_CACHE_MAXSIZE=2147483647
export CUDA_CACHE_DISABLE=0
This cured the problem on my GTX1080.
https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-understand-fat-binaries-jit-caching/
Anthony
2013 年 6 月 17 日
0 投票
2 件のコメント
Edric Ellis
2013 年 6 月 18 日
The cache is not stored where the program lives, this page from NVIDIA has all the gory details, including this:
- on Windows, %APPDATA%\NVIDIA\ComputeCache,
- on MacOS, $HOME/Library/Application\ Support/NVIDIA/ComputeCache,
- on Linux, ~/.nv/ComputeCache
Anthony
2013 年 7 月 12 日
カテゴリ
ヘルプ センター および File Exchange で GPU Computing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!