gpuArray sparse memory usage
13 ビュー (過去 30 日間)
古いコメントを表示
I have a gpu with about 2GB of available memory:
CUDADevice with properties:
Name: 'Quadro K1100M'
Index: 1
ComputeCapability: '3.0'
SupportsDouble: 1
DriverVersion: 6.5000
ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 2.1475e+09
AvailableMemory: 2.0154e+09
MultiprocessorCount: 2
ClockRateKHz: 705500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
However, I'd like to load a sparse array into it (R2015A, which supports sparse GPUarray):
whos('pxe')
Name Size Bytes Class Attributes
pxe 5282400x5282400 1182580904 double sparse, complex
I get an error upon trying to copy it to GPU though:
gpxe = gpuArray(pxe);
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.
I'm not sure what the problem here is? Trying it with smaller sized sparse arrays will work, but I'm still well within the memory limits here. Is there some kind of hidden maximum size, or is it that we are not allowed to actually use most of the GPU memory? This would theoretically take up less than 60% of GPU memory.
Edit: trying smaller arrays and loading multiple ones into GPU memory:
Trial>> gpu = gpuDevice;
Trial>> mem1 = gpu.FreeMemory;
Trial>> gpxe = gpuArray(pxet.');
Trial>> mem2 = gpu.FreeMemory;
Trial>> gpye = gpuArray(pyet.');
Trial>> mem3 = gpu.FreeMemory;
Trial>> gpxi = gpuArray(pxit.');
Trial>> mem4 = gpu.FreeMemory;
Trial>> gpyi = gpuArray(pyit.');
Trial>> mem5 = gpu.FreeMemory;
Sizes of these arrays are theoretically:
whos('pxet','pyet','pxit','pyit')
Name Size Bytes Class Attributes
pxet 211600x211600 47266024 double sparse, complex
pxit 211600x211600 47266024 double sparse, complex
pyet 211600x211600 47266024 double sparse, complex
pyit 211600x211600 47266024 double sparse, complex
Sequential memory footprint in the GPU:
Trial>> mem1-mem2
ans =
147456000
Trial>> mem2-mem3
ans =
39059456
Trial>> mem3-mem4
ans =
39059456
Trial>> mem4-mem5
ans =
39059456
So the very first one preallocates a huge chunk of memory, and subsequent ones take up less space than they should? Seems to me like I need to have enough GPU memory to fit the initial preallocation that's about 3 times as big as it needs to.
採用された回答
Edric Ellis
2015 年 8 月 12 日
The first time you start up any of the GPU support within MATLAB, a series of libraries are loaded, and these consume memory on the GPU. Sparse gpuArray uses a different representation compared to the CPU (it uses CSR layout, and 4-byte integers for indices) which explains why the number of bytes consumed by a given sparse matrix is different on the GPU and the CPU. Converting between these formats requires additional storage on the GPU, which almost certainly explains why you cannot create the large sparse matrix on the GPU.
3 件のコメント
Edric Ellis
2015 年 8 月 12 日
You're quite right, sorry for not spelling that out. On the CPU, MATLAB uses Compressed Sparse Column format; on the GPU, gpuArray uses Compressed Sparse Row since it generally has better parallel performance, and better library support. Unfortunately, this means we need to perform the (relatively expensive) format conversion when sending/gathering sparse data.
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で GPU Computing in MATLAB についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!