Suggestions to speed up FFT of convolution
15 ビュー (過去 30 日間)
古いコメントを表示
The following code is at the center of a physics model. As it scales approximately as O(N log(N)) it is the computationally most expensive part of the simulation. Therefore, it makes sense to try to optimize this part of the code as much as possible and I am looking for ideas what else to try. Conceptually the calculation involves a FFT, multiplication, and an inverse FFT to obtain the result. The simulations are done for 3D volumes [nx,ny,nz] of vectors (so the total number of cells is N=3*nx*ny*nz) and the size of the volumes varies significantly for different problems. The code now mostly runs on GPUs, but it can also be run on CPUs (this is mostly done for initial testing). Thus optimization for performance on GPUs is the main goal. One observation I made is that it is faster to use component wise multiplication vs. multiprod.
Any suggestions what else to try to reduce the time spent on calculating this?
Below is a trimmed down version of the main part of the code (updated to include examples for variables):
rng(1,'twister');
M=gpuArray(rand([100,100,100,3],"double")); %create a random vectorfield
Periodicity=[0 0 0];
%M: [n1,n2,n3,3] 3D vectorfield
[nx,ny,nz,n]=size(M);
nxP=2*nx;
nyP=2*ny;
nzP=2*nz;
ftN=gpuArray(rand([nxP,nyP,nzP,n,n],"double")); %for simplicty create ftN tensors with random entries
%Treat periodicity: Periodicity: [true/false, true/false, true/false]
rep=[1 1 1]+Periodicity;
Mhelp=repmat(M,[rep(1),rep(2),rep(3),1]);
ftM=zeros([nxP nyP nzP n],'like',M);
%Fourier transform
ftM(:,:,:,1)=fftn(Mhelp(:,:,:,1),[nxP nyP nzP]);
ftM(:,:,:,2)=fftn(Mhelp(:,:,:,2),[nxP nyP nzP]);
ftM(:,:,:,3)=fftn(Mhelp(:,:,:,3),[nxP nyP nzP]);
%ftN: [nxP,nyP,nzP,3,3] 3D volume of 3x3 tensors
%ftH=multiprod(ftN,ftM,[4 5],[4]);
%06/08/2022 version without multiprod - on GPUs achieved of the order 25% reduction in calculation time compared to multiprod.
ftH(:,:,:,1)=ftN(:,:,:,1,1).*ftM(:,:,:,1)+ftN(:,:,:,1,2).*ftM(:,:,:,2)+ftN(:,:,:,1,3).*ftM(:,:,:,3);
ftH(:,:,:,2)=ftN(:,:,:,2,1).*ftM(:,:,:,1)+ftN(:,:,:,2,2).*ftM(:,:,:,2)+ftN(:,:,:,2,3).*ftM(:,:,:,3);
ftH(:,:,:,3)=ftN(:,:,:,3,1).*ftM(:,:,:,1)+ftN(:,:,:,3,2).*ftM(:,:,:,2)+ftN(:,:,:,3,3).*ftM(:,:,:,3);
%Inverse FFT
Hh=zeros([nxP nyP nzP n],'like',M);
Hh(:,:,:,3)=-real((ifftn(ftH(:,:,:,3))));
Hh(:,:,:,1)=-real((ifftn(ftH(:,:,:,1))));
Hh(:,:,:,2)=-real((ifftn(ftH(:,:,:,2))));
H=zeros([nxP nyP nzP n],'like',M);
%Final result
H=Hh(nx:nxP-1,ny:nyP-1,nz:nzP-1,:);
5 件のコメント
Paul
2022 年 6 月 8 日
On my system running on the CPU and using the Profiler, I'm seeing that the computations for ftH take the most time, even after pre-allocation of ftH.
For reasons I don't understand, the fftn() computation for the first slice of ftM takes about 3x longer than for the second and third slices.
回答 (2 件)
Matt J
2022 年 6 月 8 日
編集済み: Matt J
2022 年 6 月 8 日
I see a doubling of the speed if you re-organize your4D and 5D arrays as cell arrays:
Mhelp=num2cell(Mhelp,[1,2,3]);
ftN=squeeze(num2cell(ftN,[1,2,3]));
H=cell(1,3);
gputimeit(@()executeIt)
function executeIt
%Fourier transform
ftM{1}=fftn(Mhelp{1},[nxP nyP nzP]);
ftM{2}=fftn(Mhelp{2},[nxP nyP nzP]);
ftM{3}=fftn(Mhelp{3},[nxP nyP nzP]);
ftH=cell(1,3);
ftH{1}=ftN{1,1}.*ftM{1}+ftN{1,2}.*ftM{2}+ftN{1,3}.*ftM{3};
ftH{2}=ftN{2,1}.*ftM{1}+ftN{2,2}.*ftM{2}+ftN{2,3}.*ftM{3};
ftH{3}=ftN{3,1}.*ftM{1}+ftN{3,2}.*ftM{2}+ftN{3,3}.*ftM{3};
for i=1:3
tmp=-real( ifftn(ftH{i} ) );
H{i}=tmp(nx:nxP-1,ny:nyP-1,nz:nzP-1);
end
end
Matt J
2022 年 6 月 9 日
編集済み: Matt J
2022 年 6 月 9 日
If you have R2022a,
Mhelp=repmat(M,[rep(1),rep(2),rep(3),1]);
%Fourier transform
ftM=fftn(-Mhelp,[nxP nyP nzP,1]);
%Tensorprod requires R2022a
ftH=tensorprod(ftN,ftM,[4,5],4);
%Inverse FFT
Hh=real(ifftn(ftH,[nxP nyP nzP,1] ));
H=Hh(nx:nxP-1,ny:nyP-1,nz:nzP-1,:);
27 件のコメント
Matt J
2022 年 6 月 10 日
Not too surprising, IMO. On the GPU, it's not the number of operations that matter so much as the parallelism of the tasks. I don't see how conjugate symmetry would increase the opportunity for parallelization.
参考
カテゴリ
Help Center および File Exchange で Fourier Analysis and Filtering についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!