Improve computation speed of function
8 ビュー (過去 30 日間)
古いコメントを表示
Hi,
I am running the functions shown below using matrixes of sizes above 300*300*300 and less than the size of 600*600*600 on my mac with 4 cores and on a unix computer with 12 cores. The function is run up to 1000 times per run. I have access to the parallell computation toolbox but lack experience in using this. I've tried to improve the efficiency of the first function shown by reducing the number of operations and making it more effective.
The result was however that less of the cpu power was used and the computation time was unchanged. I was therefore wondering if anyone had any suggestions as to how I can improve the computation speed? The code looks messy but I hope someone can help.
I started with the following function:
function [output]=del2_8(input,dx);
[nrows,ncolumns ndepth]=size(input);
input2=zeros(nrows+8,ncolumns+8,ndepth+8);
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)=input;
% create output matrix
output=zeros(nrows,ncolumns,ndepth);
output2=zeros(nrows,ncolumns,ndepth);
output3=zeros(nrows,ncolumns,ndepth);
output = (-1/560*input2(1:nrows,5:ncolumns+4,5:ndepth+4)+8/315*input2(2:nrows+1,5:ncolumns+4,5:ndepth+4)-1/5*input2(3:nrows+2,5:ncolumns+4,5:ndepth+4)...
+8/5*input2(4:nrows+3,5:ncolumns+4,5:ndepth+4)-205/72*input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)...
+8/5*input2(6:nrows+5,5:ncolumns+4,5:ndepth+4) -1/5*input2(7:nrows+6,5:ncolumns+4,5:ndepth+4)...
+8/315*input2(8:nrows+7,5:ncolumns+4,5:ndepth+4)-1/560*input2(9:nrows+8,5:ncolumns+4,5:ndepth+4))/dx^2;
output2 = (-1/560*input2(5:nrows+4,1:ncolumns,5:ndepth+4)+8/315*input2(5:nrows+4,2:ncolumns+1,5:ndepth+4)-1/5*input2(5:nrows+4,3:ncolumns+2,5:ndepth+4)...
+8/5*input2(5:nrows+4,4:ncolumns+3,5:ndepth+4)-205/72*input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)...
+8/5*input2(5:nrows+4,6:ncolumns+5,5:ndepth+4) -1/5*input2(5:nrows+4,7:ncolumns+6,5:ndepth+4)...
+8/315*input2(5:nrows+4,8:ncolumns+7,5:ndepth+4)-1/560*input2(5:nrows+4,9:ncolumns+8,5:ndepth+4))/delx^2;
output3 = (-1/560*input2(5:nrows+4,5:ncolumns+4,1:ndepth)+8/315*input2(5:nrows+4,5:ncolumns+4,2:ndepth+1)-1/5*input2(5:nrows+4,5:ncolumns+4,3:ndepth+2)...
+8/5*input2(5:nrows+4,5:ncolumns+4,4:ndepth+3)-205/72*input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)...
+8/5*input2(5:nrows+4,5:ncolumns+4,6:ndepth+5) -1/5*input2(5:nrows+4,5:ncolumns+4,7:ndepth+6)...
+8/315*input2(5:nrows+4,5:ncolumns+4,8:ndepth+7)-1/560*input2(5:nrows+4,5:ncolumns+4,9:ndepth+8))/delx^2;
output = output + output2 + output3;
I reduced the number of computations and compressed everything to one operation:
function output=del2_83(input,dx);
[nrows,ncolumns ndepth]=size(input);
input2=zeros(nrows+8,ncolumns+8,ndepth+8);
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)=input;
%create output matrix
output=zeros(nrows,ncolumns,ndepth);
output=(-1/560*(input2(1:nrows,5:ncolumns+4,5:ndepth+4)+input2(9:nrows+8,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,1:ncolumns,5:ndepth+4) ...
+input2(5:nrows+4,9:ncolumns+8,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,1:ndepth)+input2(5:nrows+4,5:ncolumns+4,9:ndepth+8)) ...
+8/315*(input2(2:nrows+1,5:ncolumns+4,5:ndepth+4)+input2(8:nrows+7,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,2:ncolumns+1,5:ndepth+4) ...
+input2(5:nrows+4,8:ncolumns+7,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,2:ndepth+1)+input2(5:nrows+4,5:ncolumns+4,8:ndepth+7)) ...
-1/5*(input2(3:nrows+2,5:ncolumns+4,5:ndepth+4)+input2(7:nrows+6,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,3:ncolumns+2,5:ndepth+4) ...
+input2(5:nrows+4,7:ncolumns+6,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,3:ndepth+2)+input2(5:nrows+4,5:ncolumns+4,7:ndepth+6)) ...
+8/5*(input2(4:nrows+3,5:ncolumns+4,5:ndepth+4)+input2(6:nrows+5,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,4:ncolumns+3,5:ndepth+4) ...
+input2(5:nrows+4,6:ncolumns+5,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,4:ndepth+3)+input2(5:nrows+4,5:ncolumns+4,6:ndepth+5)) ...
-205/72*(input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)))/dx^2;
Thanks in advance!
1 件のコメント
Sven
2012 年 8 月 16 日
Can you describe (conceptually) what the function is meant to do?
I believe it's easier to view/understand when formatted like this:
function output=del2_83(input,dx)
[nrows,ncolumns ndepth]=size(input);
input2=zeros(nrows+8,ncolumns+8,ndepth+8);
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)=input;
output=zeros(nrows,ncolumns,ndepth);
output=(-1/560*(...
input2(1:nrows,5:ncolumns+4,5:ndepth+4)+...
input2(9:nrows+8,5:ncolumns+4,5:ndepth+4)+...
input2(5:nrows+4,1:ncolumns,5:ndepth+4)+...
input2(5:nrows+4,9:ncolumns+8,5:ndepth+4)+...
input2(5:nrows+4,5:ncolumns+4,1:ndepth)+...
input2(5:nrows+4,5:ncolumns+4,9:ndepth+8)) ...
+8/315*(...
input2(2:nrows+1,5:ncolumns+4,5:ndepth+4)+...
input2(8:nrows+7,5:ncolumns+4,5:ndepth+4)+...
input2(5:nrows+4,2:ncolumns+1,5:ndepth+4)+...
input2(5:nrows+4,8:ncolumns+7,5:ndepth+4)+...
input2(5:nrows+4,5:ncolumns+4,2:ndepth+1)+...
input2(5:nrows+4,5:ncolumns+4,8:ndepth+7)) ...
-1/5*(...
input2(3:nrows+2,5:ncolumns+4,5:ndepth+4)+...
input2(7:nrows+6,5:ncolumns+4,5:ndepth+4)+...
input2(5:nrows+4,3:ncolumns+2,5:ndepth+4)+...
input2(5:nrows+4,7:ncolumns+6,5:ndepth+4)+...
input2(5:nrows+4,5:ncolumns+4,3:ndepth+2)+...
input2(5:nrows+4,5:ncolumns+4,7:ndepth+6)) ...
+8/5*(...
input2(4:nrows+3,5:ncolumns+4,5:ndepth+4)+...
input2(6:nrows+5,5:ncolumns+4,5:ndepth+4)+...
input2(5:nrows+4,4:ncolumns+3,5:ndepth+4)+...
input2(5:nrows+4,6:ncolumns+5,5:ndepth+4)+...
input2(5:nrows+4,5:ncolumns+4,4:ndepth+3)+...
input2(5:nrows+4,5:ncolumns+4,6:ndepth+5)) ...
-205/72*(...
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+...
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+...
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)))/dx^2;
However I can't quite see what patterns exist as a way to simplify the calculation.
I can suggest one thing: input is a large matrix. You're creating input2 as a copy of that matrix, simply padded by zeros. This copying step will take up memory (and time to allocate that memory 1000 times) and it seems to me that it may be avoidable once we can work out what you're actually trying to achieve.
Why, for example, is the last block of numbers only adding up 3 copies of input whereas the rest all add up 6?
回答 (2 件)
Teja Muppirala
2012 年 8 月 16 日
Your function is a convolution. I think it could be written more simply like this:
function output=del2_83(input,dx)
K1 = [-1/560
8/315
-1/5
8/5
0
8/5
-1/5
8/315
-1/560];
K2 = K1';
K3 = permute(K1,[2 3 1]);
K1(5) = -205/24;
[K1,K2,K3] = deal(K1/dx^2,K2/dx^2,K3/dx^2);
output = convn(input,K1,'same') + convn(input,K2,'same') + convn(input,K3,'same');
3 件のコメント
Teja Muppirala
2012 年 8 月 17 日
Hello Sven, It was clear by looking at the function that is was certainly linear. And any linear function on a matrix can be written as a convolution. So what I did was, I called the function on a 9x9x9 matrix full of zeros with a 1 right in the middle (5,5,5) and looked at what came out. The result is equal to the equivalent convolution operator, which just so happened to be 3 1-d vectors joined together. So I didn't even need to go through and analyze all that messy code, it just came out very simply.
per isakson
2012 年 8 月 16 日
編集済み: per isakson
2012 年 8 月 17 日
I run one of your functions (R2012a 64bit, Windows7, 4 cores, 8GB, on a three years old vanilla Dell)
>> Z = randn(300,300,300);
>> tic,z = del2_8( Z, 1e-3 ); toc
Elapsed time is 11.796133 seconds.
where I maded the following changes
delx = dx;
% output=zeros(nrows,ncolumns,ndepth);
% output2=zeros(nrows,ncolumns,ndepth);
% output3=zeros(nrows,ncolumns,ndepth);
I noticed that
- Task Manager showed cpu usage 60-75%
- Commenting out the pre-allocation makes a discernible speed increase :)
- "1:ncolumns" may be replaced by ":", etc.
Next
tic, output = del2_83( Z, 1e-3); toc
Elapsed time is 9.213884 seconds.
I noticed a somewhat lower cpu-usage - it peaked at 70%.
What cpu usage do you see?
.
--- parfor ---
Next I tried: (copied from an on-line help example)
matlabpool( 3 )
M = { randn(300,300,300), randn(300,300,300), randn(300,300,300) };
output = cell( 1, 3 );
dx = 1e-3;
tic
parfor ii = 1 : 3
output{ii} = del2_83( M{ii}, dx );
end
toc
matlabpool close
and got
Starting matlabpool using the 'local' profile ... connected to 3 labs.
Elapsed time is 34.575283 seconds.
Sending a stop signal to all the labs ... stopped.
I noticed that:
- the cpu usage started at rather low values before topping at 100% for a third of the time - something like that. Three cores showed similar patterns.
- the memory usage topped close to 8GB. The peak value occurred when cpu-usage was 100%. Thus, I believe that the process was not limited by the memory. However, twelve cores and 600x600x600 will require a lot of ram.
- "parallel" did not speed up the execution in this case.
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Programming Utilities についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!