How do I know how large an array can fit on the GPU?
20 ビュー (過去 30 日間)
古いコメントを表示
Hi
I am trying some analysis on gpu like fft() functions.
But the array is too large to calulate on my GPU(TITAN Xp).
So, I thought slicing array and put it on GPU then collecting and reshape after calculating.
But, I don't know what size is fit on my GPU.
Please how can I know the fit array size on my GPU.
thank you.
Jae-Hee Park
採用された回答
Mike Croucher
2022 年 8 月 26 日
編集済み: Mike Croucher
2022 年 8 月 26 日
As you've seen, gpuDevice() gives you information about your GPU. This is what I get for mine
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5894e+09
AvailableMemory: 7.2955e+09
MultiprocessorCount: 46
ClockRateKHz: 1725000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
The important parameter here is AvailableMemory. I have 7.2955e+09 bytes (you have rather more!). What does this mean in terms of matrix size?
A double precision number is 8 bytes so in theory I can have 7.2955e+09/8 = 911937500 doubles on the card. This is my hard, nothing I can do about it, limit. There simply isn't the capacity on my GPU to have more than that. Consider this an upper bound. In terms of a square matrix its roughly 30,000 x 30,000 since
sqrt(911937500)
ans =
3.0198e+04
Let's transfer a matrix that big to my GPU and see if I'm successful
a = zeros(3.0198e+04);
>> gpuA = gpuArray(a);
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'NVIDIA GeForce RTX 3070'
Index: 1
ComputeCapability: '8.6'
SupportsDouble: 1
DriverVersion: 11.6000
ToolkitVersion: 11.2000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 8.5894e+09
AvailableMemory: 110592
MultiprocessorCount: 46
ClockRateKHz: 1725000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
Worked! and I had 110592 bytes left over.
However, the useful limit will be rather lower than this. If I stuff my card full of data then there's no room for any GPU algorithm to do any computation. Even adding 1 to all the elements of a GPU array this big is too much. Clearly matrix addition isn't done completely in place.
gpuA = gpuA +1;
Error using +
Out of memory on device. To view more detail about available memory on the GPU,
use 'gpuDevice()'. If the problem persists, reset the GPU by calling
'gpuDevice(1)'.
I can at least do something though. The sum command works, for example, even though the answer isn't very interesting in this case.
>> sum(gpuA,'all')
ans =
0
How much memory you need to do computations depends on the algorithms involved but hopefully you can use this thinking as a starting point for what you can expect to squeeze onto your GPU.
1 件のコメント
Joss Knight
2022 年 9 月 1 日
Just FYI, MATLAB won't allow in-place computation on a workspace variable because it needs to hold onto the original array in case of error (or user Ctrl-C). Computation inside a function on local variables will be more optimized.
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Matrix Indexing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!