ifft2 on GPU array
6 ビュー (過去 30 日間)
古いコメントを表示
I am trying to compute the ifft2 of a multiple matrices. The simplete code snippet is:
gAs = gpuArray.rand(999, 519, 20);
gBs = gpuArray.rand(999, 519);
ifft2(gAs .* gBs, "symmetric");
Error using gpuArray/ifft2
An invalid array was used on the GPU.
I thought that I was using all the GPU memory. I tried using single GPU arrays but it However, I then tried the following code (bigger matrix) and worked just fine.
gAs = gpuArray.rand(1000, 519, 2);
gBs = gpuArray.rand(1000, 519);
ifft2(gAs .* gBs, "symmetric");
I know that I can also do a for-loop through gAs slices and it works but I want to get some speedup by doing it in one call to ifft2.
I wanted to understand why this is happening and if there is a way in which I can pad the matrices so that I can still get the ifft2 of the original matrices.
For reference:
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'Tesla V100-SXM2-32GB'
Index: 1
ComputeCapability: '7.0'
SupportsDouble: 1
DriverVersion: 11.2000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 3.4090e+10
AvailableMemory: 3.3167e+10
MultiprocessorCount: 80
ClockRateKHz: 1530000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
3 件のコメント
Walter Roberson
2022 年 1 月 3 日
Sorry, I would have to boot into a different operating system to test (GPU is not supported on my MacOS.)
採用された回答
Matt J
2022 年 1 月 4 日
編集済み: Matt J
2022 年 1 月 4 日
I think you should probably just omit the 'symmetric' flag. On the GPU (mine at least), it doesn't seem to make a big difference in performance:
A = gpuArray.rand(512,512,512);
gputimeit(@() ifft2(A,'symmetric') ) % 0.0706 seconds
gputimeit(@() ifft2(A) ) % 0.0753 seconds
Whether this is an indication of sub-optimal software design on Mathworks part, I'm not sure. On the CPU, the 'symmetric' flag means the software does fewer flops, but on a parallel system like the GPU, it's not the number of flops that matters.
0 件のコメント
その他の回答 (1 件)
Matt J
2022 年 1 月 3 日
編集済み: Matt J
2022 年 1 月 3 日
I think it's a bug, but one solution might be,
fn=@(z,d) ifft(z,[],d,'symmetric');
out = fn( fn(gAs .* gBs,1) ,2);
2 件のコメント
Matt J
2022 年 1 月 4 日
編集済み: Matt J
2022 年 1 月 4 日
It seems I had a conceptual error. ifft(ifft(X,1,'sym'),2,'sym') is not a valid replacement for ifft2(X,'sym') unless X is symmetric about both the x and y axes.
However, it does seem like a bug that only certain array sizes work for gpuArray.ifft2(). The CPU version of ifft2() doesn't have that problem.
参考
カテゴリ
Help Center および File Exchange で GPU Computing についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!