How can I multiply square submatrices more efficiently?

Question

Adam Peterson 2019 年 11 月 4 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/489082-how-can-i-multiply-square-submatrices-more-efficiently

編集済み: James Tursa 2019 年 11 月 5 日

I have two n-by-n matrices, X and Y. I want to perform the following procedure, where all multiplication steps are elementwise multiplication.

The first result is obtained by multiplying both of the n-by-n matrices and then summing the result to yield a row vector. This row vector is saved into the first row of my results matrix.
The second result is obtained by multiplying X(2:end, 2:end) by Y(1:end-1, 2:end) and summing the result to yield a row vector. This row vector is saved into the second row of my results matrix.
The third result is obtained by multiplying X(3:end, 3:end) by Y(1:end-2, 3:end) and summing the result to yield a row vector. This row vector is saved into the third row of my results matrix.
This pattern repeats until the nth result is obtained by multiplying the bottom right element from X by the upper right element from Y. This result is saved into the last row of my results matrix.

To remove any doubt about how my matrices are setup, here is a graphic showing my two original n-by-n matrices, and each of the submatrices:

It's important for later computations in my workflow that the result of each step is left-aligned in its respective row of the results matrix:

I currently calculate my results using a for loop. On each iteration, I index the respective submatrices from X and Y, multiply them, sum the result, and save the row vector to the respective row of the results matrix. I cannot imagine a more efficient approach, but the ingenuity of other MATLAB users has certainly impressed me before. Is there a faster way to do this? In my case, the matrices I use are pretty large, something like 10,000-by-10,000 elements.

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

James Tursa 2019 年 11 月 4 日

I agree with Stephen. Do you have a supported C/C++ compiler installed?

Adam Peterson 2019 年 11 月 5 日

MATLAB Online で開く

Thanks for the comments everyone. Here's a simple example showing the results for two 3-by-3 matrices.

X = [1 2 3; 4 5 6; 7 8 9];
Y = [10 11 12; 13 14 15; 16 17 18];    
result = zeros(size(X));
n = size(X,1);
for i=1:n
    result(i,1:end-i+1) = sum(X(i:end,i:end).*Y(1:end-i+1,i:end));
end

The result matrix has these values:

[174 228 288; 167 207 0; 108 0 0]

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

James Tursa 2019 年 11 月 4 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/489082-how-can-i-multiply-square-submatrices-more-efficiently#answer_399842

編集済み: James Tursa 2019 年 11 月 5 日

MATLAB Online で開く

Here is the naive mex code. Could probably be made faster by doing the for-loop in parallel or trying to optimize cache hits, but I didn't spend the time to code that. Avoids all temporary data copies and runs about 10x faster than m-code on my machine:

>> mex xymult.c -lmwblas -largeArrayDims
Building with 'Microsoft Visual C++ 2013 Professional (C)'.
MEX completed successfully.
>> n = 5000;
>> x = randi(10,n,n); y = randi(10,n,n);
>> tic;c=xymult(x,y);toc
Elapsed time is 49.459822 seconds.
>> tic;m=xymult_m(x,y);toc
Elapsed time is 480.140740 seconds.
>> isequal(c,m)
ans =
     1

The mex code:

/* File: xymult.c                                            */
/* Does a special multiply & summing from Answers question   */
/* Programmer:  James Tursa                                  */
/* Date: 4-Nov-2019                                          */
/* Complile command:  mex xymult.c -lmwblas -largeArrayDims  */
/* Includes ----------------------------------------------------------- */
#include "mex.h"
#include "blas.h"
#if false  /* true = use dotproduct code , false = use BLAS ddot */
    #define  xDOT      dotproduct  /* Hard-coded loop */
#elif defined(__OS2__) || defined(__WINDOWS__) || defined(WIN32) || defined(_WIN32) || defined(WIN64) || defined(_WIN64) || defined(_MSC_VER)
    #define  xDOT      ddot        /* Win dot product name */
#else
    #define  xDOT      ddot_       /* linux dot product name */
#endif
double dotproduct( mwSignedIndex *n, double *x, mwSignedIndex *incx,
                                     double *y, mwSignedIndex *incy)
{
    double result = 0.0;
    mwSignedIndex i;
    
    for( i=0; i<*n; i++ ) {
        result += *x * *y;
        x += *incx;
        y += *incy;
    }
    return result;
}
/* Gateway ------------------------------------------------------------ */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double *A, *B, *C, *a, *b, *c;
    mwSignedIndex i, j, n, N, INCX=1, INCY=1;
    if( nrhs != 2 || 
        !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) ||
        mxIsSparse(prhs[0])  || mxIsSparse(prhs[1]) ||
        mxIsComplex(prhs[0]) || mxIsComplex(prhs[1]) ||
        mxGetNumberOfDimensions(prhs[0]) != 2 ||
        mxGetNumberOfDimensions(prhs[1]) != 2 ||
        mxGetM(prhs[0]) != mxGetN(prhs[0]) ||
        mxGetM(prhs[1]) != mxGetN(prhs[1]) ||
        mxGetM(prhs[0]) != mxGetM(prhs[1]) ) {
        mexErrMsgTxt("Need exactly two full double square inputs");
    }
    if( nlhs > 1 ) {
        mexErrMsgTxt("Too many outputs");
    }
    N = n = mxGetM(prhs[0]);
    plhs[0] = mxCreateDoubleMatrix(N,N,mxREAL);
    A = (double *) mxGetData(prhs[0]);
    B = (double *) mxGetData(prhs[1]);
    C = (double *) mxGetData(plhs[0]);
    for( i=0; i<N; i++ ) {
        a = A + i * (N+1); /* step A to next diagonal element */
        b = B + i * N;     /* step B to next column */
        c = C + i;         /* step C to next row */
        for( j=0; j<n; j++ ) { /* sum(a.*b) is just dot(a_column,b_column) in loop */
            *c = xDOT( &n, a, &INCX, b, &INCY );
            a += N; b += N; c += N; /* move all pointers to next columns */
        }
        n--;
    }
}

The m-code:

function c = xymult_m(a,b)
c = zeros(size(a));
n = size(a,1);
for k=1:n
    c(k,1:n-k+1) = sum(a(k:end,k:end).*b(1:end-k+1,k:end));
end
end

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Adam Peterson 2019 年 11 月 5 日

Thanks for taking the time to do this. I've written some MEX functions in the past, but certainly nothing so complicated. My C/C++ skills aren't very strong.

I've compiled it and tested it and it works perfectly! It gives a full order of magnitude improvement in execution time, which is a huge help to me!

サインインしてコメントする。

How can I multiply square submatrices more efficiently?

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How can I multiply square submatrices more efficiently?

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示