- MATLAB has a memory re-use scheme on the GPU that causes the values returned in FreeMemory to be misleading. FreeMemory shows the number of bytes actually free on the GPU, but it doesn't reflect how much memory is actually available to MATLAB to create new gpuArrays. This is why later releases have a property AvailableMemory, which reflects how many bytes are available to make new gpuArrays.
- When transferring an array from the CPU to the GPU, there's a format conversion required. MATLAB on the CPU stores complex data as two separate allocations - the real part, and the imaginary part. On the GPU, the data is stored in a single interleaved allocation. The transformation from split-complex to interleaved-complex is performed on the GPU, and this requires extra space. Therefore, the maximum complex array that can be transferred from the CPU to the GPU is roughly half the total GPU memory size.
Why complex arrays take twice as much memory on GPU than on CPU ?
1 回表示 (過去 30 日間)
古いコメントを表示
>> ver
---------------------------------------------------------------------------------------------
MATLAB Version: 8.3.0.532 (R2014a)
MATLAB License Number: ••••••
Operating System: Linux 2.6.32-431.23.3.el6.x86_64 #1 SMP Thu Jul 31 17:20:51 UTC 2014 x86_64
Java Version: Java is not enabled
---------------------------------------------------------------------------------------------
MATLAB Version 8.3 (R2014a)
Simulink Version 8.3 (R2014a)
Control System Toolbox Version 9.7 (R2014a)
Curve Fitting Toolbox Version 3.4.1 (R2014a)
Image Processing Toolbox Version 9.0 (R2014a)
MATLAB Compiler Version 5.1 (R2014a)
Mapping Toolbox Version 4.0.1 (R2014a)
Optimization Toolbox Version 7.0 (R2014a)
Parallel Computing Toolbox Version 6.4 (R2014a)
Signal Processing Toolbox Version 6.21 (R2014a)
Statistics Toolbox Version 9.0 (R2014a)
System Identification Toolbox Version 9.0 (R2014a)
>>
>> gpu=gpuDevice(1)
gpu =
CUDADevice with properties:
Name: 'Tesla K40m'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 6
ToolkitVersion: 5.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 1.2079e+10
FreeMemory: 1.1914e+10
MultiprocessorCount: 15
ClockRateKHz: 875500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
>> A=rand(1000,1000); whos('A')
Name Size Bytes Class Attributes
A 1000x1000 8000000 double
>> m=gpu.FreeMemory; B=gpuArray(A); fprintf('%d bytes\n',m-gpu.FreeMemory);
8126464 bytes
>> clear A B
>>
>> A=complex(rand(1000,1000),rand(1000,1000)); whos('A')
Name Size Bytes Class Attributes
A 1000x1000 16000000 double complex
>> m=gpu.FreeMemory; B=gpuArray(A); fprintf('%d bytes\n',m-gpu.FreeMemory);
32374784 bytes
>>
0 件のコメント
回答 (3 件)
Edric Ellis
2015 年 10 月 23 日
Based on the prior comments, I think I understand the problem now. Complex arrays on the GPU take up the same amount of memory as on the GPU, but (especially in R2014a), it can be difficult to see that for various reasons. On my machine using R2014a, the following steps:
d = gpuDevice(1);
f1 = d.FreeMemory;
gcx = repmat(gpuArray(1i), 1000, 1000);
f2 = d.FreeMemory;
bytesPerElement = (f1 - f2) / (1000*1000)
demonstrate that a complex gpuArray uses 16 bytes per element, just like on the CPU.
Now, there are two subtleties here that I think are getting in the way of what you're actually trying to achieve:
You can avoid the problem in (2) if it is possible to construct the array directly on the GPU (as I did in my example) - however I appreciate that's not always possible.
Matt J
2015 年 10 月 22 日
編集済み: Matt J
2015 年 10 月 22 日
"FreeMemory" appears to be an undocumented method or property. When I use "AvailableMemory" instead, I get the correct result.
>> A=complex(rand(1000,1000),rand(1000,1000));
>> clear B; m=gpu.AvailableMemory; B=gpuArray(A);
>> fprintf('%d bytes\n',m-gpu.AvailableMemory);
16374784 bytes
6 件のコメント
Matt J
2015 年 10 月 22 日
編集済み: Matt J
2015 年 10 月 22 日
I have tested on both the GTX 580 and the Titan X. Here's my version info,
Parallel Computing Toolbox Version 6.6 (R2015a)
I suppose this could account for the difference in the output of gpuDevice, though strangely a google search on "FreeMemory" doesn't show up for me anywhere (leading me to have thought that it was undocumented).
Have you independently verified that the GPU is consuming 32 MB? Perhaps it is just being reported incorrectly by gpu.FreeMemory. Edric has said that it is the wrong thing to use.
Lessmann
2015 年 10 月 22 日
Hi,
this behaviour is not a difference between CPU and GPU. It is the general case that the complex nuber uses twice the memory.
Name Size Bytes Class Attributes
A 5x5 200 double
B 5x5 400 double complex
Matlab use two double to save the real and the imaginary part, so twice the memory need.
参考
カテゴリ
Help Center および File Exchange で Multicore Processor Targets についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!