gpuDevice Crashing Matlab

I'm try to get up and running with GPU computing through the Parallel Computing Toolbox, but I'm having trouble getting the toolbox to work. When I run "gpuDevice", "gpuDeviceCount", or "gpuArray", Matlab instantaneously crashes, leaving only a "6573 Floating point exception" error in my shell window (the number changes every time). The crash leaves behind a "matlab_crash_dump" file, but the file is empty. Has anyone had this problem before and been able to discover what the problem was?
I'm on a Linux machine with a Quadro 4000 GPU and NVidia's 295.20 drivers. I've had this problem since I got the toolbox a few months ago, but at the time assumed it was because I was using an old and unsupported set of drivers. Those have been updated now, but I still get the same problem.
Thanks

回答 (2 件)

Jason Ross
Jason Ross 2012 年 4 月 13 日

0 投票

What distro? What version of MATLAB? 64 or 32 bit?
If you run "nvidia-smi --query", do you get usable output? How does the device show up in the nvidia-settings application?
Is the Quadro being used for display and compute, or is it compute only?
FWIW when I've seen odd problems like this, the cause has come down to a defective card. Typical setup is to install the driver and start MATLAB, then it works.

4 件のコメント

Greg
Greg 2012 年 4 月 13 日
Running R2012a now, though I had the same issue before in R2011b (before I got the GPU driver updated). 64-bit Linux. The Quadro is being used for display and compute purposes, but I haven't attempted to run any other compute applications on it.
(I am running two monitors in "Twinview" which has caused Matlab issues before when trying to perform certain graphically-complex tasks on the secondary monitor. That shouldn't play into this, should it?)
The nvidia-smi --query returns:
==============NVSMI LOG==============
Timestamp : Fri Apr 13 11:49:32 2012
Driver Version : 295.20
Attached GPUs : 1
GPU 0000:03:00.0
Product Name : Quadro 4000
Display Mode : Enabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0320211009948
GPU UUID : GPU-66059132-1610-05c0-618b-ad9e5cd80320
VBIOS Version : 70.00.2F.00.12
Inforom Version
OEM Object : 1.0
ECC Object : N/A
Power Management Object : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x06DD10DE
Bus Id : 0000:03:00.0
Sub System Id : 0x078010DE
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Fan Speed : 36 %
Performance State : P12
Memory Usage
Total : 2047 MB
Used : 189 MB
Free : 1857 MB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 14 %
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Total : N/A
Temperature
Gpu : 58 C
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Clocks
Graphics : 50 MHz
SM : 101 MHz
Memory : 135 MHz
Max Clocks
Graphics : 475 MHz
SM : 950 MHz
Memory : 1404 MHz
Compute Processes : None
Jason Ross
Jason Ross 2012 年 4 月 13 日
Could you try running it with a single monitor only?
Greg
Greg 2012 年 4 月 13 日
Tried that and got the same crash.
Jason Ross
Jason Ross 2012 年 4 月 13 日
Huh. I'm rapidly running short on ideas.
Do you by chance have the CUDA toolkit / SDK installed? There is an example in there called deviceQuery. I'm wondering if it would give you a response, or crash?
Also, do you have the capability to put this card in another machine and/or use it in Windows? It would be interesting to see if the crash would follow it.

サインインしてコメントする。

Yair Carmon
Yair Carmon 2015 年 8 月 13 日

0 投票

I had a similar issue on a remote server that ran Ubuntu 12.04, Matlab 2015a, CUDA 7.0, and a GeForce GTX 960. During a routine run of my application, the nvidia-smi utility (which was open using watch nvidia-smi, to monitor GPU utilization) suddenly printed "Error" instead of things like temperature and available memory. A complete system crash followed immediately, and it was necessary to power cycle the machine before it started responding to ping again.
When the system came back online I had the problems reported above: any attempt to run nvidia-smi or gpuDevice/gpuArray would result in a crash. It was not a problem with the card - we swapped GPU's and the issue persisted. Uninstalling and reinstalling the CUDA toolkit using apt-get did not help either. The problem was finally resolved by reinstalling the entire OS, Matlab and CUDA 7.0 in that order. I suspect that using the CUDA 7.0 .run installation might have solved the problem without having to go through OS installation. I hope to never have a chance to check that :).

カテゴリ

ヘルプ センター および File ExchangeIntroduction to Installation and Licensing についてさらに検索

質問済み:

2012 年 4 月 13 日

回答済み:

2015 年 8 月 13 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by