gpuArray sparse memory usage

Question

Zheng Gu 2015 年 8 月 7 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/232835-gpuarray-sparse-memory-usage

コメント済み: Zheng Gu 2015 年 8 月 12 日

I have a gpu with about 2GB of available memory:

 CUDADevice with properties:
                      Name: 'Quadro K1100M'
                     Index: 1
         ComputeCapability: '3.0'
            SupportsDouble: 1
             DriverVersion: 6.5000
            ToolkitVersion: 6.5000
        MaxThreadsPerBlock: 1024
          MaxShmemPerBlock: 49152
        MaxThreadBlockSize: [1024 1024 64]
               MaxGridSize: [2.1475e+09 65535 65535]
                 SIMDWidth: 32
               TotalMemory: 2.1475e+09
           AvailableMemory: 2.0154e+09
       MultiprocessorCount: 2
              ClockRateKHz: 705500
               ComputeMode: 'Default'
      GPUOverlapsTransfers: 1
    KernelExecutionTimeout: 1
          CanMapHostMemory: 1
           DeviceSupported: 1
            DeviceSelected: 1

However, I'd like to load a sparse array into it (R2015A, which supports sparse GPUarray):

whos('pxe')
Name            Size                      Bytes  Class     Attributes     
pxe       5282400x5282400            1182580904  double    sparse, complex

I get an error upon trying to copy it to GPU though:

gpxe = gpuArray(pxe);
Error using gpuArray
An unexpected error occurred on the device. The error code was: UNKNOWN_ERROR.

I'm not sure what the problem here is? Trying it with smaller sized sparse arrays will work, but I'm still well within the memory limits here. Is there some kind of hidden maximum size, or is it that we are not allowed to actually use most of the GPU memory? This would theoretically take up less than 60% of GPU memory.

Edit: trying smaller arrays and loading multiple ones into GPU memory:

Trial>> gpu = gpuDevice;
Trial>> mem1 = gpu.FreeMemory;
Trial>> gpxe = gpuArray(pxet.');
Trial>> mem2 = gpu.FreeMemory;
Trial>> gpye = gpuArray(pyet.');
Trial>> mem3 = gpu.FreeMemory;
Trial>> gpxi = gpuArray(pxit.');
Trial>> mem4 = gpu.FreeMemory;
Trial>> gpyi = gpuArray(pyit.');
Trial>> mem5 = gpu.FreeMemory;

Sizes of these arrays are theoretically:

whos('pxet','pyet','pxit','pyit')
Name           Size                   Bytes  Class     Attributes     
pxet      211600x211600            47266024  double    sparse, complex
pxit      211600x211600            47266024  double    sparse, complex
pyet      211600x211600            47266024  double    sparse, complex
pyit      211600x211600            47266024  double    sparse, complex

Sequential memory footprint in the GPU:

Trial>> mem1-mem2
 ans =
   147456000
 Trial>> mem2-mem3
 ans =
    39059456
 Trial>> mem3-mem4
 ans =
    39059456
 Trial>> mem4-mem5
 ans =
    39059456

So the very first one preallocates a huge chunk of memory, and subsequent ones take up less space than they should? Seems to me like I need to have enough GPU memory to fit the initial preallocation that's about 3 times as big as it needs to.

2 件のコメント
なしを表示なしを非表示

Matt J 2015 年 8 月 7 日

Have you tried rebooting?

Zheng Gu 2015 年 8 月 7 日

Just tried rebooting but the result is the same

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Edric Ellis 2015 年 8 月 12 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/232835-gpuarray-sparse-memory-usage#answer_189053

The first time you start up any of the GPU support within MATLAB, a series of libraries are loaded, and these consume memory on the GPU. Sparse gpuArray uses a different representation compared to the CPU (it uses CSR layout, and 4-byte integers for indices) which explains why the number of bytes consumed by a given sparse matrix is different on the GPU and the CPU. Converting between these formats requires additional storage on the GPU, which almost certainly explains why you cannot create the large sparse matrix on the GPU.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Edric Ellis 2015 年 8 月 12 日

You're quite right, sorry for not spelling that out. On the CPU, MATLAB uses Compressed Sparse Column format; on the GPU, gpuArray uses Compressed Sparse Row since it generally has better parallel performance, and better library support. Unfortunately, this means we need to perform the (relatively expensive) format conversion when sending/gathering sparse data.

Zheng Gu 2015 年 8 月 12 日

Thanks for the answer. The libraries being loaded are understandable, and in the grand scheme of things fairly negligible (it looks like 100MB or so). The format conversion is problematic - since I get an error loading a 1.2GB matrix into 2GB VRAM, throw in the libraries and it looks like converting formats takes up about 700MB, more than half the total size of the matrix itself? Is there a way to convert it in system RAM, and then send to GPU?

サインインしてコメントする。

gpuArray sparse memory usage

2 件のコメント
なしを表示なしを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

gpuArray sparse memory usage

2 件のコメント なしを表示なしを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示