Improve computation speed of function

Question

0 投票

Hi,

I am running the functions shown below using matrixes of sizes above 300*300*300 and less than the size of 600*600*600 on my mac with 4 cores and on a unix computer with 12 cores. The function is run up to 1000 times per run. I have access to the parallell computation toolbox but lack experience in using this. I've tried to improve the efficiency of the first function shown by reducing the number of operations and making it more effective.

The result was however that less of the cpu power was used and the computation time was unchanged. I was therefore wondering if anyone had any suggestions as to how I can improve the computation speed? The code looks messy but I hope someone can help.

I started with the following function:

 function [output]=del2_8(input,dx); 
 [nrows,ncolumns ndepth]=size(input); 
 input2=zeros(nrows+8,ncolumns+8,ndepth+8);
 input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)=input;

% create output matrix

 output=zeros(nrows,ncolumns,ndepth); 
 output2=zeros(nrows,ncolumns,ndepth); 
 output3=zeros(nrows,ncolumns,ndepth); 
 output = (-1/560*input2(1:nrows,5:ncolumns+4,5:ndepth+4)+8/315*input2(2:nrows+1,5:ncolumns+4,5:ndepth+4)-1/5*input2(3:nrows+2,5:ncolumns+4,5:ndepth+4)...
          +8/5*input2(4:nrows+3,5:ncolumns+4,5:ndepth+4)-205/72*input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)...
          +8/5*input2(6:nrows+5,5:ncolumns+4,5:ndepth+4) -1/5*input2(7:nrows+6,5:ncolumns+4,5:ndepth+4)...
          +8/315*input2(8:nrows+7,5:ncolumns+4,5:ndepth+4)-1/560*input2(9:nrows+8,5:ncolumns+4,5:ndepth+4))/dx^2;
 output2 = (-1/560*input2(5:nrows+4,1:ncolumns,5:ndepth+4)+8/315*input2(5:nrows+4,2:ncolumns+1,5:ndepth+4)-1/5*input2(5:nrows+4,3:ncolumns+2,5:ndepth+4)...
          +8/5*input2(5:nrows+4,4:ncolumns+3,5:ndepth+4)-205/72*input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)...
          +8/5*input2(5:nrows+4,6:ncolumns+5,5:ndepth+4) -1/5*input2(5:nrows+4,7:ncolumns+6,5:ndepth+4)...
          +8/315*input2(5:nrows+4,8:ncolumns+7,5:ndepth+4)-1/560*input2(5:nrows+4,9:ncolumns+8,5:ndepth+4))/delx^2;
 output3 = (-1/560*input2(5:nrows+4,5:ncolumns+4,1:ndepth)+8/315*input2(5:nrows+4,5:ncolumns+4,2:ndepth+1)-1/5*input2(5:nrows+4,5:ncolumns+4,3:ndepth+2)...
          +8/5*input2(5:nrows+4,5:ncolumns+4,4:ndepth+3)-205/72*input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)...
          +8/5*input2(5:nrows+4,5:ncolumns+4,6:ndepth+5) -1/5*input2(5:nrows+4,5:ncolumns+4,7:ndepth+6)...
          +8/315*input2(5:nrows+4,5:ncolumns+4,8:ndepth+7)-1/560*input2(5:nrows+4,5:ncolumns+4,9:ndepth+8))/delx^2;
output = output + output2 + output3;

I reduced the number of computations and compressed everything to one operation:

 function output=del2_83(input,dx); 
 [nrows,ncolumns ndepth]=size(input); 
 input2=zeros(nrows+8,ncolumns+8,ndepth+8);
 input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)=input;

%create output matrix

 output=zeros(nrows,ncolumns,ndepth); 
 output=(-1/560*(input2(1:nrows,5:ncolumns+4,5:ndepth+4)+input2(9:nrows+8,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,1:ncolumns,5:ndepth+4) ...
      +input2(5:nrows+4,9:ncolumns+8,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,1:ndepth)+input2(5:nrows+4,5:ncolumns+4,9:ndepth+8)) ...
      +8/315*(input2(2:nrows+1,5:ncolumns+4,5:ndepth+4)+input2(8:nrows+7,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,2:ncolumns+1,5:ndepth+4) ...
      +input2(5:nrows+4,8:ncolumns+7,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,2:ndepth+1)+input2(5:nrows+4,5:ncolumns+4,8:ndepth+7)) ...
      -1/5*(input2(3:nrows+2,5:ncolumns+4,5:ndepth+4)+input2(7:nrows+6,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,3:ncolumns+2,5:ndepth+4) ...
      +input2(5:nrows+4,7:ncolumns+6,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,3:ndepth+2)+input2(5:nrows+4,5:ncolumns+4,7:ndepth+6)) ...
      +8/5*(input2(4:nrows+3,5:ncolumns+4,5:ndepth+4)+input2(6:nrows+5,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,4:ncolumns+3,5:ndepth+4) ...
      +input2(5:nrows+4,6:ncolumns+5,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,4:ndepth+3)+input2(5:nrows+4,5:ncolumns+4,6:ndepth+5)) ...
      -205/72*(input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)))/dx^2;

Thanks in advance!

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Sven 2012 年 8 月 16 日

MATLAB Online で開く

Can you describe (conceptually) what the function is meant to do?

I believe it's easier to view/understand when formatted like this:

function output=del2_83(input,dx)
[nrows,ncolumns ndepth]=size(input);
input2=zeros(nrows+8,ncolumns+8,ndepth+8);
input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)=input;
output=zeros(nrows,ncolumns,ndepth);
output=(-1/560*(...
    input2(1:nrows,5:ncolumns+4,5:ndepth+4)+...
    input2(9:nrows+8,5:ncolumns+4,5:ndepth+4)+...
    input2(5:nrows+4,1:ncolumns,5:ndepth+4)+...
    input2(5:nrows+4,9:ncolumns+8,5:ndepth+4)+...
    input2(5:nrows+4,5:ncolumns+4,1:ndepth)+...
    input2(5:nrows+4,5:ncolumns+4,9:ndepth+8)) ...
    +8/315*(...
    input2(2:nrows+1,5:ncolumns+4,5:ndepth+4)+...
    input2(8:nrows+7,5:ncolumns+4,5:ndepth+4)+...
    input2(5:nrows+4,2:ncolumns+1,5:ndepth+4)+...
    input2(5:nrows+4,8:ncolumns+7,5:ndepth+4)+...
    input2(5:nrows+4,5:ncolumns+4,2:ndepth+1)+...
    input2(5:nrows+4,5:ncolumns+4,8:ndepth+7)) ...
    -1/5*(...
    input2(3:nrows+2,5:ncolumns+4,5:ndepth+4)+...
    input2(7:nrows+6,5:ncolumns+4,5:ndepth+4)+...
    input2(5:nrows+4,3:ncolumns+2,5:ndepth+4)+...
    input2(5:nrows+4,7:ncolumns+6,5:ndepth+4)+...
    input2(5:nrows+4,5:ncolumns+4,3:ndepth+2)+...
    input2(5:nrows+4,5:ncolumns+4,7:ndepth+6)) ...
    +8/5*(...
    input2(4:nrows+3,5:ncolumns+4,5:ndepth+4)+...
    input2(6:nrows+5,5:ncolumns+4,5:ndepth+4)+...
    input2(5:nrows+4,4:ncolumns+3,5:ndepth+4)+...
    input2(5:nrows+4,6:ncolumns+5,5:ndepth+4)+...
    input2(5:nrows+4,5:ncolumns+4,4:ndepth+3)+...
    input2(5:nrows+4,5:ncolumns+4,6:ndepth+5)) ...
    -205/72*(...
    input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+...
    input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)+...
    input2(5:nrows+4,5:ncolumns+4,5:ndepth+4)))/dx^2;

However I can't quite see what patterns exist as a way to simplify the calculation.

I can suggest one thing: input is a large matrix. You're creating input2 as a copy of that matrix, simply padded by zeros. This copying step will take up memory (and time to allocate that memory 1000 times) and it seems to me that it may be avoidable once we can work out what you're actually trying to achieve.

Why, for example, is the last block of numbers only adding up 3 copies of input whereas the rest all add up 6?

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Teja Muppirala 2012 年 8 月 16 日

MATLAB Online で開く

3 投票

Your function is a convolution. I think it could be written more simply like this:

function output=del2_83(input,dx)
K1 = [-1/560   
       8/315   
      -1/5     
       8/5     
       0    
       8/5     
      -1/5     
       8/315   
      -1/560];  
K2 = K1';
K3 = permute(K1,[2 3 1]);
K1(5) = -205/24;
[K1,K2,K3] = deal(K1/dx^2,K2/dx^2,K3/dx^2);
output = convn(input,K1,'same') + convn(input,K2,'same') + convn(input,K3,'same');

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

Teja Muppirala 2012 年 8 月 17 日

Hello Sven, It was clear by looking at the function that is was certainly linear. And any linear function on a matrix can be written as a convolution. So what I did was, I called the function on a 9x9x9 matrix full of zeros with a 1 right in the middle (5,5,5) and looked at what came out. The result is equal to the equivalent convolution operator, which just so happened to be 3 1-d vectors joined together. So I didn't even need to go through and analyze all that messy code, it just came out very simply.

Matt Fig 2012 年 8 月 17 日

Well done, sir!

サインインしてコメントする。

Answer 2

per isakson 2012 年 8 月 16 日

編集済み: per isakson 2012 年 8 月 17 日

MATLAB Online で開く

0 投票

I run one of your functions (R2012a 64bit, Windows7, 4 cores, 8GB, on a three years old vanilla Dell)

    >> Z = randn(300,300,300);
    >> tic,z = del2_8( Z, 1e-3 ); toc
    Elapsed time is 11.796133 seconds.

where I maded the following changes

    delx = dx;
    % output=zeros(nrows,ncolumns,ndepth);
    % output2=zeros(nrows,ncolumns,ndepth);
    % output3=zeros(nrows,ncolumns,ndepth);

I noticed that

Task Manager showed cpu usage 60-75%
Commenting out the pre-allocation makes a discernible speed increase :)
"1:ncolumns" may be replaced by ":", etc.

    tic, output = del2_83( Z, 1e-3); toc
    Elapsed time is 9.213884 seconds.

I noticed a somewhat lower cpu-usage - it peaked at 70%.

What cpu usage do you see?

.

--- parfor ---

Next I tried: (copied from an on-line help example)

    matlabpool( 3 )
    M = { randn(300,300,300), randn(300,300,300), randn(300,300,300) };
    output = cell( 1, 3 );
    dx = 1e-3;
    tic
    parfor ii = 1 : 3
        output{ii} = del2_83( M{ii}, dx ); 
    end
    toc
    matlabpool close

and got

    Starting matlabpool using the 'local' profile ... connected to 3 labs.
    Elapsed time is 34.575283 seconds.
    Sending a stop signal to all the labs ... stopped.

I noticed that:

the cpu usage started at rather low values before topping at 100% for a third of the time - something like that. Three cores showed similar patterns.
the memory usage topped close to 8GB. The peak value occurred when cpu-usage was 100%. Thus, I believe that the process was not limited by the memory. However, twelve cores and 600x600x600 will require a lot of ram.
"parallel" did not speed up the execution in this case.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Improve computation speed of function

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

回答 (2 件)

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

Improve computation speed of function

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

回答 (2 件)

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示