ifft2 on GPU array

6 ビュー (過去 30 日間)
Bruno Alvisio
Bruno Alvisio 2022 年 1 月 3 日
編集済み: Matt J 2022 年 1 月 4 日
I am trying to compute the ifft2 of a multiple matrices. The simplete code snippet is:
gAs = gpuArray.rand(999, 519, 20);
gBs = gpuArray.rand(999, 519);
ifft2(gAs .* gBs, "symmetric");
Error using gpuArray/ifft2
An invalid array was used on the GPU.
I thought that I was using all the GPU memory. I tried using single GPU arrays but it However, I then tried the following code (bigger matrix) and worked just fine.
gAs = gpuArray.rand(1000, 519, 2);
gBs = gpuArray.rand(1000, 519);
ifft2(gAs .* gBs, "symmetric");
I know that I can also do a for-loop through gAs slices and it works but I want to get some speedup by doing it in one call to ifft2.
I wanted to understand why this is happening and if there is a way in which I can pad the matrices so that I can still get the ifft2 of the original matrices.
For reference:
>> gpuDevice()
ans =
CUDADevice with properties:
Name: 'Tesla V100-SXM2-32GB'
Index: 1
ComputeCapability: '7.0'
SupportsDouble: 1
DriverVersion: 11.2000
ToolkitVersion: 11
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 3.4090e+10
AvailableMemory: 3.3167e+10
MultiprocessorCount: 80
ClockRateKHz: 1530000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceAvailable: 1
DeviceSelected: 1
  3 件のコメント
Bruno Alvisio
Bruno Alvisio 2022 年 1 月 3 日
Right. Thank you. Shouldn't this code throw the same error since the matrices are not symmetric: (in my case runs fine)
gAs = gpuArray.rand(999, 519);
gBs = gpuArray.rand(999, 519);
ifft2(gAs .* gBs, "symmetric");
Thanks again
Walter Roberson
Walter Roberson 2022 年 1 月 3 日
Sorry, I would have to boot into a different operating system to test (GPU is not supported on my MacOS.)

サインインしてコメントする。

採用された回答

Matt J
Matt J 2022 年 1 月 4 日
編集済み: Matt J 2022 年 1 月 4 日
I think you should probably just omit the 'symmetric' flag. On the GPU (mine at least), it doesn't seem to make a big difference in performance:
A = gpuArray.rand(512,512,512);
gputimeit(@() ifft2(A,'symmetric') ) % 0.0706 seconds
gputimeit(@() ifft2(A) ) % 0.0753 seconds
Whether this is an indication of sub-optimal software design on Mathworks part, I'm not sure. On the CPU, the 'symmetric' flag means the software does fewer flops, but on a parallel system like the GPU, it's not the number of flops that matters.

その他の回答 (1 件)

Matt J
Matt J 2022 年 1 月 3 日
編集済み: Matt J 2022 年 1 月 3 日
I think it's a bug, but one solution might be,
fn=@(z,d) ifft(z,[],d,'symmetric');
out = fn( fn(gAs .* gBs,1) ,2);
  2 件のコメント
Bruno Alvisio
Bruno Alvisio 2022 年 1 月 4 日
Thanks for the answer. The code you provided is correct.
I have noticed though that very often there is a discrepancy between the results of the function handle fn and ifft2 even for 2 dimensional matrices when their sizes are greater than ~4. I created the following code snippet. If run multiple times sometimes it displays not equal .
clear all;
close all;
fn=@(z,d) ifft(z, [], d, "symmetric");
m = 5;
n = 4;
a = gpuArray.rand(m, n);
b = gpuArray.rand(m, n);
c = ifft2(a .* b, "symmetric");
d = fn(fn(a .* b, 1), 2);
if ~abs(c - d) <= eps(max(abs(c), abs(d)))
disp("not equal")
end
IIUC, are you suggesting that there is a bug in ifft2 when the symmetric flag is provided.
Matt J
Matt J 2022 年 1 月 4 日
編集済み: Matt J 2022 年 1 月 4 日
It seems I had a conceptual error. ifft(ifft(X,1,'sym'),2,'sym') is not a valid replacement for ifft2(X,'sym') unless X is symmetric about both the x and y axes.
However, it does seem like a bug that only certain array sizes work for gpuArray.ifft2(). The CPU version of ifft2() doesn't have that problem.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

製品


リリース

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by