CUDA Unexpected Error for nndata2gpu

15 ビュー (過去 30 日間)
LukasR
LukasR 2018 年 4 月 30 日
コメント済み: Harley Edwards 2018 年 8 月 18 日
Hi, I am currently trying to train a fitnet on a GPU (NVIDIA Titan Xp). However, whenever I try to format my data using nndata2gpu and gpu2nndata, I run into the following error:
Error using gpuArray/gather An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_ILLEGAL_ADDRESS
The code used is:
% tinput=nndata2gpu(input);
ttarget=nndata2gpu(target);
fundnet=configure(fundnet,input,target);
tic
fundnet=train(fundnet,tinput,ttarget,'useGPU','yes','showResources','yes');
toc
ty=fundnet(tinput);
y=gpu2nndata(ty);
fundnet=perform(fundnet,target,y);
The device is recognized without any problems (gpuDevice loads in less than a second), drivers are up to date. Using Matlab R2018a. Any idea what could be the source of this issue?
Many thanks in advance!
  2 件のコメント
Joss Knight
Joss Knight 2018 年 5 月 2 日
This doesn't look good. Could you provide a standalone example - i.e. generate some data that triggers the error and include it in your code?
LukasR
LukasR 2018 年 5 月 2 日
編集済み: LukasR 2018 年 5 月 2 日
Many thanks for the answer. While generating the sample data, I found what I believe is the source of the issue: For
input=rand(20,6000000);
target=rand(1,6000000);
the error occurs while for
input=rand(20,2000000);
target=rand(1,2000000);
it doesn't. The size of my own dataset amounts to approx. 5400000x25 (inputs) and 5400000x1 (targets).
Here is the entire executable code (which triggers the error):
input=rand(20,6000000);
target=rand(1,6000000);
nneurons=10;
technet=fitnet(nneurons,'trainscg');
technet.trainParam.epochs=10000;
technet.trainParam.goal=0;
technet.trainParam.min_grad=1e-6;
technet.trainParam.max_fail=200;
technet.trainParam.sigma=5.0e-7;
technet.trainParam.lambda=5.0e-7;
technet.trainParam.show=25;
technet.trainParam.showCommandLine=false;
technet.trainParam.showWindow=true;
technet.trainParam.time=inf;
technet.divideParam.trainRatio = 70/100;
technet.divideParam.valRatio = 15/100;
technet.divideParam.testRatio = 15/100;
for i=1:technet.numLayers
if strcmp(technet.layers{i}.transferFcn,'tansig')
technet.layers{i}.transferFcn = 'elliotsig';
end
end
tinput=nndata2gpu(input);
ttarget=nndata2gpu(target);
technet=configure(technet,input,target);
tic
technet=train(technet,tinput,ttarget,'useGPU','yes','showResources','yes');
toc
ty=technet(tinput);
technetout=gpu2nndata(ty);
technetperformance=perform(technet,target,technetout);
Another note: The GPU training DOES work normally without the nndata2gpu command, albeit quite disappointingly (only a 1.5x speedup compared to an i7-7500U for the dataset described above). Furthermore, after the error occurs once, it will also occur for smaller datasets until I restart the whole program (in fact, I am not able to create any gpuArrays before restarting MATLAB).

サインインしてコメントする。

採用された回答

Joss Knight
Joss Knight 2018 年 5 月 2 日
Looks like you found a bug, many thanks. We will investigate. Meanwhile, best guess for now, this is caused by using more data than the GPU train function can handle. If you can reduce the size of the input without compromising your application, then that is the work-around.
  1 件のコメント
Harley Edwards
Harley Edwards 2018 年 8 月 18 日
I think I found a similar/related error. I have a data set in which all the data trains well separately but will not together, despite having sufficient memory, and turning off kernel execution timeout. I have Inputs 200X844000, and Targets of 6X844000. I can only train 325000 samples at a time on a Geforce 1080. Please let me know how I can contribute to solving this problem, if you want my code.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeParallel and Cloud についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by