MATLAB Answers

reset(gpuDevice) does not work

99 ビュー (過去 30 日間)
Renu Dhadwal
Renu Dhadwal 2016 年 8 月 18 日
コメント済み: giorgio toscana 2020 年 4 月 7 日
When I run the following code for values of n<5000 it runs just fine.
reset(gpuDevice);
n=5000;
a=gpuArray(rand(n));
b=gpuArray(rand(n));
tic
t=a'*a;
c=t\(a*b');
toc
But when I run it for n=5000 i get the error "Error using \ Call to Double LU on GPU failed with error status: unspecified launch failure."
If I try running the program again for any small value of n I get the error
"Error using parallel.gpu.CUDADevice/reset
An unexpected error pccured during CUDA execution. The CUDA error was " all CUDA -capable devices are busy or unavailable"
Also, if I execute the following command
g=gpuDevice;
disp(g.FreeMemory)
I get the answer to be NAN
I am unable to run the reset(gpuDevice) command. It gives the same error as above.

  2 件のコメント

Walter Roberson
Walter Roberson 2016 年 8 月 18 日
Which MATLAB version are you using, and which operating system, and which GPU are you using? Also which gpu driver version do you have installed?
arnold
arnold 2016 年 8 月 20 日
Hi,
I was just now looking for this error, I have a similar problem on a machine at work. I tried using
class(a)
ans =
gpuArray
b = medfilt2(a,[9,9]);
Error using medfilt2gpumex
Failure in GPU implementation.
unspecified launch failure.
Error in gpuArray/medfilt2 (line 37)
b = medfilt2gpumex(varargin{:});
Filter sizes [7,7] and smaller work but 9 upwards gives this error. After that, the gpuDevice also shows
availableMemory: NaN
From this I can't use the GPU anymore without restarting Matlab. This is too bad since the GPU is 20 times faster doing this kind of calculations.
Setup:
  • Matlab 2016a
  • Windows 10 Pro 64 (all updates)
  • Intel 5960X
  • 64GB RAM
  • GTX1080 with 372.54 (newest driver).

サインインしてコメントする。

採用された回答

Alison Eele
Alison Eele 2016 年 8 月 25 日
I think you are experiencing the symptoms of a kernel execution time-out. If the GPU is connected to a monitor (or in Windows the GPU is running in WDDM mode) then the operating system imposes a kill time out on any operation taking place on the GPU. The intention of this timeout is to allow screen display to continue. When this kill takes place on a MATLAB process using the GPU it disrupts our connection to the GPU and typically requires a restart of MATLAB to fix.
You can find out if a kernel time-out is in place on your GPU by executing the gpuDevice command in MATLAB. One of the properties listed will be:
KernelExecutionTimeout: 0
If this is 0 then there is no execution timeout being applied to that card. If it is 1 then the operating system is imposing a timeout (the exact timeout varies by operating system).
Ways to work around the issue:
  • If possible do computation in smaller pieces to avoid the timeout.
  • If there are multiple GPU cards in the computer and the computer is Windows then some NVIDIA cards can be switched from WDDM (display) to TCC (compute) mode using the nvidia-smi utility. TCC cards do not have an execution timeout. You cannot connect a display to a TCC mode card.
  • In Windows it is possible to lengthen the timeout using registry edits though as with all registry edits this should be done with care. https://msdn.microsoft.com/en-us/Library/Windows/Hardware/ff569918(v=vs.85).aspx

  5 件のコメント

表示 2 件の古いコメント
arnold
arnold 2016 年 8 月 27 日
Renu,
strange since I can do matrix inversions of large arrays that take much longer than 2s WITHOUT running into problems. So no Timeout there even if the inversion takes a minute which means the 'inv' function works differently then 'medfilt2' for instance.
Funnily enough though, matrix inversion on the GPU is slower than on the CPU. For large matrices (> 10000x10000) which are already transfered to VRAM.... at least for my workstation in the lab where both GPU and CPU are no slouch.
I have set TdrDelay in the registry to 30s now and everything runs without hiccups, I'm just not sure if that is going to be enough so in case I run important simulations, I'll just set it to 0 meaning no timeout no matter the execution time. But for every day work, I don't think this is a good idea.
Alison Eele
Alison Eele 2016 年 9 月 2 日
Hi Arnold, Renu,
The ability to split up larger computations into smaller pieces is very application specific. Some operations could be effectively tiled across a large matrix but others cannot. Tiled or element wise calculations is something that GPU computing often excels at and would fall into the 'use smaller blocks' option for avoiding the timeout.
As Arnold's experiments indicate the kernel timeout applies to a single kernel level operation. So whilst the total GPU computation time appears above the 2 seconds limit, the smaller kernels called as part of the GPU computation might still be below the limit and you see no problem.
TCC driver mode in Windows is as you identified limited to a few high level cards, normally chosen for their 'suitability' for scientific computation I believe the only GTX cards supported are from the Titan range. I had hoped they had included the Geforce 1080 as standard with the new generation.
arnold
arnold 2016 年 9 月 25 日
Hi Alison,
I think most of my computations should be splittable into parallel tasks as I do a lot of element wise computations of image stacks. Can you hint me in the right direction as to how to split that up, maybe with blockfunctions?
regards Arnold

サインインしてコメントする。

その他の回答 (2 件)

Yahya Zakaria mohamed
Yahya Zakaria mohamed 2017 年 6 月 29 日
Thank You. I faced the same problem, I disconnected the second monitor and no error appeared.

  0 件のコメント

サインインしてコメントする。


Ricardo de Azevedo
Ricardo de Azevedo 2019 年 11 月 19 日
編集済み: Ricardo de Azevedo 2019 年 11 月 21 日
I am facing the same problem now training an RNN and have tried both the TdrDelay to longer and the TdrLevel to 0.
Error:
Error using gpuArray/gatherAn unexpected error occurred during CUDA execution. The CUDA error was:CUDA_ERROR_LAUNCH_FAILED
The weird thing is the network trains for a while and then crashes, I can't really tell what triggers it.
(Using Matlab 2019b and latest NVIDIA drivers 441.20 for GTX 1080 Ti)

  3 件のコメント

giorgio toscana
giorgio toscana 2020 年 4 月 6 日
Hi,
I have the same issue of yours.
Did you solve it ?
Thanks
Ricardo de Azevedo
Ricardo de Azevedo 2020 年 4 月 6 日
I desisted as I had other things to do and couldn't follow up.
Mathworks Support Sent me this:
After conferring with colleagues in development, there are a few steps we can take to narrow down the issue.
  • If you are able to get a minimum set of data and code that reproduces the issue, that would be the easiest way to see what is causing this error.
  • Try reducing the 'MiniBatchSize' all the way down to 1 to see if the issue still occurs
  • Find out where the error actually occurred. One easy way to do this is to run with profiling switched on by calling the following command before running the script:
>> profile on
This should cause the CUDA error to be thrown after the line of code where the issue occurred.
giorgio toscana
giorgio toscana 2020 年 4 月 7 日
Hi Ricardo,
I will try them.
If the problem persists i'll contact the support with those info.
Thank you very much for your quick reply.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by