Suggestions to speed up FFT of convolution

15 ビュー (過去 30 日間)

Tim 2022 年 6 月 8 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1736640-suggestions-to-speed-up-fft-of-convolution

コメント済み: Matt J 2022 年 6 月 10 日

The following code is at the center of a physics model. As it scales approximately as O(N log(N)) it is the computationally most expensive part of the simulation. Therefore, it makes sense to try to optimize this part of the code as much as possible and I am looking for ideas what else to try. Conceptually the calculation involves a FFT, multiplication, and an inverse FFT to obtain the result. The simulations are done for 3D volumes [nx,ny,nz] of vectors (so the total number of cells is N=3*nx*ny*nz) and the size of the volumes varies significantly for different problems. The code now mostly runs on GPUs, but it can also be run on CPUs (this is mostly done for initial testing). Thus optimization for performance on GPUs is the main goal. One observation I made is that it is faster to use component wise multiplication vs. multiprod.

Any suggestions what else to try to reduce the time spent on calculating this?

Below is a trimmed down version of the main part of the code (updated to include examples for variables):

rng(1,'twister');
M=gpuArray(rand([100,100,100,3],"double")); %create a random vectorfield 
Periodicity=[0 0 0];
%M: [n1,n2,n3,3] 3D vectorfield 
[nx,ny,nz,n]=size(M);
nxP=2*nx;
nyP=2*ny;
nzP=2*nz;
ftN=gpuArray(rand([nxP,nyP,nzP,n,n],"double")); %for simplicty create ftN tensors with random entries
%Treat periodicity: Periodicity: [true/false, true/false, true/false]
rep=[1 1 1]+Periodicity;
Mhelp=repmat(M,[rep(1),rep(2),rep(3),1]);
ftM=zeros([nxP nyP nzP n],'like',M);
%Fourier transform
ftM(:,:,:,1)=fftn(Mhelp(:,:,:,1),[nxP nyP nzP]);
ftM(:,:,:,2)=fftn(Mhelp(:,:,:,2),[nxP nyP nzP]);
ftM(:,:,:,3)=fftn(Mhelp(:,:,:,3),[nxP nyP nzP]);
%ftN: [nxP,nyP,nzP,3,3] 3D volume of 3x3 tensors 
%ftH=multiprod(ftN,ftM,[4 5],[4]);
%06/08/2022 version without multiprod - on GPUs achieved of the order 25% reduction in calculation time compared to multiprod.
ftH(:,:,:,1)=ftN(:,:,:,1,1).*ftM(:,:,:,1)+ftN(:,:,:,1,2).*ftM(:,:,:,2)+ftN(:,:,:,1,3).*ftM(:,:,:,3);
ftH(:,:,:,2)=ftN(:,:,:,2,1).*ftM(:,:,:,1)+ftN(:,:,:,2,2).*ftM(:,:,:,2)+ftN(:,:,:,2,3).*ftM(:,:,:,3);
ftH(:,:,:,3)=ftN(:,:,:,3,1).*ftM(:,:,:,1)+ftN(:,:,:,3,2).*ftM(:,:,:,2)+ftN(:,:,:,3,3).*ftM(:,:,:,3);
%Inverse FFT
Hh=zeros([nxP nyP nzP n],'like',M);
Hh(:,:,:,3)=-real((ifftn(ftH(:,:,:,3))));
Hh(:,:,:,1)=-real((ifftn(ftH(:,:,:,1))));
Hh(:,:,:,2)=-real((ifftn(ftH(:,:,:,2))));
H=zeros([nxP nyP nzP n],'like',M);
%Final result
H=Hh(nx:nxP-1,ny:nyP-1,nz:nzP-1,:);

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

Paul 2022 年 6 月 8 日

On my system running on the CPU and using the Profiler, I'm seeing that the computations for ftH take the most time, even after pre-allocation of ftH.

For reasons I don't understand, the fftn() computation for the first slice of ftM takes about 3x longer than for the second and third slices.

Tim 2022 年 6 月 8 日

I'm not too concerned about the performance on a CPU as we only use this for testing. But you may want to try the multiprod version of the multiplication, I seem to recall that it was comparable with the component wise version on CPUs. I'm not sure about why the timing would be that much different either.

サインインしてコメントする。

サインインしてこの質問に回答する。

回答 (2 件)

Matt J 2022 年 6 月 8 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1736640-suggestions-to-speed-up-fft-of-convolution#answer_981930

編集済み: Matt J 2022 年 6 月 8 日

MATLAB Online で開く

I see a doubling of the speed if you re-organize your4D and 5D arrays as cell arrays:

Mhelp=num2cell(Mhelp,[1,2,3]);
ftN=squeeze(num2cell(ftN,[1,2,3]));
H=cell(1,3);
gputimeit(@()executeIt)
    function executeIt
        %Fourier transform
        ftM{1}=fftn(Mhelp{1},[nxP nyP nzP]);
        ftM{2}=fftn(Mhelp{2},[nxP nyP nzP]);
        ftM{3}=fftn(Mhelp{3},[nxP nyP nzP]);
        ftH=cell(1,3);
        ftH{1}=ftN{1,1}.*ftM{1}+ftN{1,2}.*ftM{2}+ftN{1,3}.*ftM{3};
        ftH{2}=ftN{2,1}.*ftM{1}+ftN{2,2}.*ftM{2}+ftN{2,3}.*ftM{3};
        ftH{3}=ftN{3,1}.*ftM{1}+ftN{3,2}.*ftM{2}+ftN{3,3}.*ftM{3};
        for i=1:3
            tmp=-real(  ifftn(ftH{i}  ) );
            H{i}=tmp(nx:nxP-1,ny:nyP-1,nz:nzP-1);
        end
    end

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

Matt J 2022 年 6 月 9 日

編集済み: Matt J 2022 年 6 月 9 日

MATLAB Online で開く

If you were to sum across the 4th dimension for example,

sum(ftN,4)

But the different ftN(:,:,:,i,j) and Mhelp(:,:,;k) seem to behave as separate variables throughout your code.

Tim 2022 年 6 月 9 日

This is only a small (but computationally expensive) part of a much larger code. I think I understand what you mean now - there are plenty of cross- and dot-products (along the 4th dimension) involved in the calculation of other physical quantities.

サインインしてコメントする。

Matt J 2022 年 6 月 9 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1736640-suggestions-to-speed-up-fft-of-convolution#answer_982240

編集済み: Matt J 2022 年 6 月 9 日

MATLAB Online で開く

If you have R2022a,

Mhelp=repmat(M,[rep(1),rep(2),rep(3),1]);
        %Fourier transform
        ftM=fftn(-Mhelp,[nxP nyP nzP,1]);
        %Tensorprod requires R2022a
        ftH=tensorprod(ftN,ftM,[4,5],4);
        %Inverse FFT
        Hh=real(ifftn(ftH,[nxP nyP nzP,1] ));
        H=Hh(nx:nxP-1,ny:nyP-1,nz:nzP-1,:);

27 件のコメント
25 件の古いコメントを表示25 件の古いコメントを非表示

Matt J 2022 年 6 月 9 日

編集済み: Matt J 2022 年 6 月 9 日

Anyway, it seems that the impact on the GPU is far less predictable than on the CPU. On the GTX 1080Ti, for N=200, I get negligible difference in speed, although for N=300 and single precision, it seems quite significant:

function test

N=300;

c=@(z) complex(z,z);

ftN=c(gpuArray.rand(N,N,N,3,3,'single'));

ftM=c(gpuArray.rand(N,N,N,3,'single'));

[ftH,ftH2]=deal(gpuArray(1));

gputimeit(@()version1) %0.1480 sec

gputimeit(@()version2) %0.0532 sec.

Discrepancy=norm(ftH2(:)-ftH(:),inf)

function version1

ftH=gpuArray.ones(size(ftM));

ftH(:,:,:,1)=ftN(:,:,:,1,1).*ftM(:,:,:,1)+ftN(:,:,:,1,2).*ftM(:,:,:,2)+ftN(:,:,:,1,3).*ftM(:,:,:,3);

ftH(:,:,:,2)=ftN(:,:,:,2,1).*ftM(:,:,:,1)+ftN(:,:,:,2,2).*ftM(:,:,:,2)+ftN(:,:,:,2,3).*ftM(:,:,:,3);

ftH(:,:,:,3)=ftN(:,:,:,3,1).*ftM(:,:,:,1)+ftN(:,:,:,3,2).*ftM(:,:,:,2)+ftN(:,:,:,3,3).*ftM(:,:,:,3);

end

function version2

ftH2=sum( reshape(ftN,[],3,3).*reshape(ftM,[],1,3),3);

ftH2=reshape(ftH2,size(ftM));

end

Tim 2022 年 6 月 10 日

One more observation: the Matlab function for the actual problem the result (ftH) before the inverse FFT should be conjugate symmetric.

For conjugate symmetric arrays ifftn provides an option 'symmetric' to "compute the inverse Fourier transform faster" and ensure that the result is real. However, even for very large arrays the gains I saw on GPUs are minimal. Maybe the @MathWorks Support Team has some insights on this? Unfortunatly, the documentation does not have a lot of information on the underlying algorithm that could provide clues for what to expect.

Matt J 2022 年 6 月 10 日

Not too surprising, IMO. On the GPU, it's not the number of operations that matter so much as the parallelism of the tasks. I don't see how conjugate symmetry would increase the opportunity for parallelization.

サインインしてコメントする。

サインインしてこの質問に回答する。

カテゴリ

MATLAB Mathematics Fourier Analysis and Filtering

Help Center および File Exchange で Fourier Analysis and Filtering についてさらに検索

製品

リリース

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Suggestions to speed up FFT of convolution

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

回答 (2 件)

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

27 件のコメント
25 件の古いコメントを表示25 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Suggestions to speed up FFT of convolution

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

回答 (2 件)

8 件のコメント 6 件の古いコメントを表示6 件の古いコメントを非表示

27 件のコメント 25 件の古いコメントを表示25 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

27 件のコメント
25 件の古いコメントを表示25 件の古いコメントを非表示