Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

Question

Royi Avital 2020 年 4 月 20 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz

コメント済み: Royi Avital 2020 年 4 月 24 日

採用された回答: James Tursa

MATLAB Online で開く

Given a matrix X and its vector form

I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.

So in MATLAB code it would be something like that:

clear();
numRows = 3;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
% Lower Triangle are indices 2, 3, 6
mL = [  0, 1, 0, 0, 0, 0, 0, 0, 0   ; ...
        0, 0, 1, 0, 0, 0, 0, 0, 0   ; ...
        0, 0, 0, 0, 0, 1, 0, 0, 0   ];
% Upper Triangle are indices 4, 7, 8
mU = [  0, 0, 0, 1, 0, 0, 0, 0, 0   ; ...
        0, 0, 0, 0, 0, 0, 1, 0, 0   ; ...
        0, 0, 0, 0, 0, 0, 0, 1, 0   ];
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));

I am after sparse represenation of mU and mL in the most efficient way.

My current implementation is given by:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
    case(INCLUDE_DIAGONAL)
        numElements = 0.5 * numRows * (numRows + 1);
        diagIdx = 0;
    case(EXCLUDE_DIAGONAL)
        numElements = 0.5 * (numRows - 1) * numRows;
        diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
    elmntIdx = 0;
    for jj = 1:numRows
        for ii = (jj + diagIdx):numRows
            elmntIdx = elmntIdx + 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
    elmntIdx = numElements + 1;
    for jj = numRows:-1:1
        for ii = (jj - diagIdx):-1:1
            elmntIdx = elmntIdx - 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end

Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?

Thank You.

24 件のコメント
22 件の古いコメントを表示22 件の古いコメントを非表示

Matt J 2020 年 4 月 23 日

編集済み: Matt J 2020 年 4 月 23 日

MATLAB Online で開く

@Matt, there are many cases for using those matrices.

I can't think of any. You can pursue this for hypothetical interest if you want, of course.

Regarding fmincon(), Solving problem which have dedicate solver with general solver is usually a really bad chice.

The situations when that is true are those where calculating the objective and derivatives are faster in matrix form than in operator form. For large problems that won't always be the case, because the computational cost of implementing things in matrix form can start to outweigh the benefits of using a specialized algorithm.

Not to speak defining Linear Constarint in the form of Non Linear (Think time spent on calculating the Derivataive when it is so well defined).

I don't think mU and mL are helpful for defining either linear or nonlinear constraints. A linear constraint on the lower triangle of your unknown matrix X will always be of the form sum(T.*X, 'all')<=b, where T is some lower triangular matrix that you know in advance. The matrix form of the constraint gradient is simply T(:), which doesn't require mL at all to set up.

For non-linear constraints c(mL*X)<=0 on the lower triangular part of X, the gradient can be expressed mL.'*gradc(mL*X), but this could implemented efficeintly and without mL as follows:

B=B=tril(true(numRows),-1);
Bd=double(B);
g=Bd;
g(B)=gradc(X(B));

So, for this, you really only need to pre-compute B and Bd, which can be done with much less time and memory allocation than mL:

N=3000;
tic;
B=tril(true(N),-1);
Bd=double(B);
toc
%Elapsed time is 0.056492 seconds.
tic;
mL=GenerateTriangleExtractorMatrix( N, 1, 2);
toc
%Elapsed time is 0.264385 seconds.

>> whos B Bd mL
  Name   Size                Kilobytes     Class     Attributes
                                                               
  B      3000x3000                8790     logical             
  Bd     3000x3000               70313     double              
  mL     4498500x9000000        140602     double    sparse    

Matt J 2020 年 4 月 23 日

But that would mean your constraints are of the form mL*X(:)<=b. But since each row of mL contains only a single non-zero element, this means the constraint is equivalent to a simple bound X(j)<=b. In Matlab, you would never have to construct a matrix to represent such a constraint. You would use the vector input arguments lb and ub to specify those. I assume Gurobi has something similar.

Royi Avital 2020 年 4 月 23 日

@Matt, I know that. Whenever I can use other features of the solver I user. I have cases I need those extractors in Matrix Form. I appericiate the dialogue. I think other who will read it will gain something. I still hope someone will bring a different point of view to the pattern of vJ. Though I guess @James' solution as practically as good as it gets (Also appericate if there is something to make it even faster).

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

James Tursa 2020 年 4 月 22 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz#answer_427698

編集済み: James Tursa 2020 年 4 月 22 日

MATLAB Online で開く

Here is a mex routine that generates the sparse double matrices mL and mU directly, so no wasted memory in creating them. Seems to run about 3x-5x faster than m-code for somewhat large sizes.

/* S = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag)
 *
 * S = double sparse matrix
 * numRows = integer > 0
 * triangleFlag = 1 , extract lower triangle
 *                2 , extract upper triangle
 * diagFlag = 1 , include diagonal
 *            2 , exclude diagonal
 * where
 *
 * M = an numRows X numRows matrix of non-zero terms
 * assert(isequal(S * M(:), mX(logical(tril(M, -1))))); % for lower
 * assert(isequal(S * M(:), mX(logical(triu(M,  1))))); % for upper
 *
 * Programmer: James Tursa
 * Date: 2020-April-22
*/
        
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mwSize numRows, triangleFlag, diagFlag, numElements;
    mwIndex *Ir, *Jc;
    mwIndex i, j, k, m;
    double *pr;
    
    if( nrhs != 3 || !mxIsNumeric(prhs[0]) || !mxIsNumeric(prhs[1]) || !mxIsNumeric(prhs[2]) ||
        mxGetNumberOfElements(prhs[0]) != 1 || mxGetNumberOfElements(prhs[1]) != 1 ||
        mxGetNumberOfElements(prhs[2]) != 1 ) {
        mexErrMsgTxt("Need three numeric scalar inputs");
    }
    if( nlhs > 1 ) {
        mexErrMsgTxt("Too many outputs");
    }
    numRows = mxGetScalar(prhs[0]);
    triangleFlag = mxGetScalar(prhs[1]);
    diagFlag = mxGetScalar(prhs[2]);
    if( numRows < 1 ) {
        mexErrMsgTxt("Invalid numRows, should be > 0");
    }
    if( triangleFlag != 1 && triangleFlag != 2 ) {
        mexErrMsgTxt("Invalid triangleFlag, should be 1 or 2");
    }
    if( diagFlag != 1 && diagFlag != 2 ) {
        mexErrMsgTxt("Invalid diagFlag, should be 1 or 2");
    }
    if( diagFlag == 1 ) {
        numElements = numRows * (numRows + 1) / 2; /* include diagonal */
    } else {
        numElements = (numRows - 1) * numRows / 2; /* exclude diagonal */
    }
    plhs[0] = mxCreateSparse(numElements, numRows*numRows, numElements, mxREAL);
    pr = (double *) mxGetData(plhs[0]);
    Ir = mxGetIr(plhs[0]);
    Jc = mxGetJc(plhs[0]);
    Jc[0] = 0;
    diagFlag--;
    k = 0;
    m = 1;
    if( triangleFlag == 1 ) { /* Lower */
        for( j=0; j<numRows; j++ ) {
            for( i=0; i<numRows; i++ ) {
                if( i >= j+diagFlag ) {
                    *pr++ = 1.0;
                    *Ir++ = k++;
                    Jc[m] = Jc[m-1] + 1;
                } else {
                    Jc[m] = Jc[m-1];
                }
                m++;
            }
        }
    } else { /* Upper */
        for( j=0; j<numRows; j++ ) {
            for( i=0; i<numRows; i++ ) {
                if( i+diagFlag <= j ) {
                    *pr++ = 1.0;
                    *Ir++ = k++;
                    Jc[m] = Jc[m-1] + 1;
                } else {
                    Jc[m] = Jc[m-1];
                }
                m++;
            }
        }
    }
}

You mex the routine as follows (you need a supported C compiler installed):

mex GenerateTriangleExtractorMatrixMex.c

And some test code:

% GenerateTriangleExtractorMatrix_test.m
n = 300;
disp('m-code timing')
tic
GenerateTriangleExtractorMatrix(10000,1,1);
toc
disp('mex code timing')
tic
GenerateTriangleExtractorMatrixMex(10000,1,1);
toc
for k=1:n
    numRows = ceil(rand*5000+100);
    numCols = numRows;
    triangleFlag = (rand<0.5) + 1;
    diagFlag = (rand<0.5) + 1;
    Mm = GenerateTriangleExtractorMatrix(numRows,triangleFlag,diagFlag);
    Mx = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag);
    if( ~isequal(Mm,Mx) )
        error('Not equal');
    end
end
disp('Random tests passed')

With a sample run:

>> GenerateTriangleExtractorMatrix_test
m-code timing
Elapsed time is 9.964882 seconds.
mex code timing
Elapsed time is 1.901741 seconds.
Random tests passed

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

Royi Avital 2020 年 4 月 22 日

@James, I meant I want to use all the abstractions of the MATLAB C API. I just want to use it in my own C files. Not for MEX but for general computing. Yet I guess MATLAB blocks that kind of use.

Royi Avital 2020 年 4 月 23 日

編集済み: Royi Avital 2020 年 4 月 23 日

MATLAB Online で開く

By the way, I tried optimizing the code:

	if( triangleFlag == 1 ) { // Lower Triangle
		for( jj = 1; jj < numRows + 1; jj++ ) {
			for( ii = 1; ii < jj + diagFlag; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1];
			}
			for( ii = jj + diagFlag; ii < numRows + 1; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1] + 1;
				vV[kk] = 1.0;
				Ir[kk] = kk;
				kk++;
			}
		}
	} else { // Upper Triangle
		for( jj = 1; jj < numRows + 1; jj++ ) {
			for( ii = 1; ii < jj + 1 - diagFlag; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1] + 1;
				vV[kk] = 1.0;
				Ir[kk] = kk;
				kk++;
			}
			for( ii = jj + 1 - diagFlag; ii < numRows + 1; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1];
			}
		}
	}

But for some reason even removing the branching inside the loop didn't improve results.

Really Nice! If nothing comes up I will mark this as an answer. Thank You!

サインインしてコメントする。

Answer 2

Matt J 2020 年 4 月 23 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz#answer_428017

編集済み: Matt J 2020 年 4 月 23 日

MATLAB Online で開く

Another approach to consider is to use my MatrixObj class

https://www.mathworks.com/matlabcentral/fileexchange/26611-on-the-fly-definition-of-custom-matrix-objects

to construct an object that has the same effect as the operations mL*X and mL.'*Y, but doesn't require you to actually build the matrix,

    N=5000;
    
    tic;
     mL0=GenerateTriangleExtractorMatrix( N, 1, 2);
    toc
    %Elapsed time is 0.678702 seconds.
         
    tic;
    
        B=tril(true(N),-1);
        Bd=double(B(:));
        
        mL=MatrixObj;
        mL.Params.B=B;
        mL.Params.Bd=Bd;
        mL.Ops.mtimes=@(obj,z) z(obj.Params.B);
        mL.Trans.mtimes=@mtimesT;
    
    toc;
    %Elapsed time is 0.086228 seconds.
    
    function out=mtimesT(obj,z) 
        
        out=obj.Params.Bd;
        out(obj.Params.B)=z;
        
    end

In addition to requiring less time to construct, you can verify that it gives the same results as multiplications with mL and mL.',

        >> X=rand(N^2,1);   isequal(mL0.'*(mL0*X),mL.'*(mL*X))
        
        ans =
    
          logical
        
           1

but with considerably less memory consumption:

  >> whos mL mL0
  
  Name   Size                  Kilobytes     Class       Attributes
                                                                   
  mL     1x1                      219739     MatrixObj             
  mL0    12497500x25000000        390586     double      sparse    

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Royi Avital 2020 年 4 月 24 日

This is really nice. Thank you for the effort!

サインインしてコメントする。

Answer 3

Royi Avital 2020 年 4 月 21 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz#answer_427294

MATLAB Online で開く

My current solution:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
    case(INCLUDE_DIAGONAL)
        numElements = 0.5 * numRows * (numRows + 1);
        diagIdx = 0;
    case(EXCLUDE_DIAGONAL)
        numElements = 0.5 * (numRows - 1) * numRows;
        diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
    elmntIdx = 0;
    for jj = 1:numRows
        for ii = (jj + diagIdx):numRows
            elmntIdx = elmntIdx + 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
    elmntIdx = numElements + 1;
    for jj = numRows:-1:1
        for ii = (jj - diagIdx):-1:1
            elmntIdx = elmntIdx - 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end

I like the memory allocation is kept to a minimum.

I wonder if there is a more efficient way to generate vJ. It is trivial to remove the inner loop and just count the number of elements yet in MATLAB it will mean each iteration will allocate memory (As we don't have iterators).

2 件のコメント
なしを表示なしを非表示

Tommy 2020 年 4 月 21 日

The two methods are fairly similar - I also like that yours minimizes memory allocation. I ran a few simple fun tests:

I didn't dare try higher than 20,000 for numRows. It seems that your code may possibly perform better at higher values of numRows. In the second case (calculating both the upper and lower triangles) I had your code running both sets of for loops, one after the other (shown in red). In green is the result from your code if only the first set of loops runs, and you recognize that vJ for one triangle is easy to determine if you have vJ for the other triangle (N^2+1-flip(vJ)). So the only thing I'll conclude from this is, if you will eventually calculate both the lower and upper triangle matrices for a given size, it might be better to calculate them together and only find vJ once. I suppose it depends on how expensive N^2+1-flip(vJ) is.

Royi Avital 2020 年 4 月 21 日

@Tommy, Really liked your analysis. Yes, when dealing with sparse matrices the whole point it making sure allocation is kept to minimum. I agree if one wants both, it is better to do the trick you mentioned.

Let's see if someone can think on a different pattern to populate vJ which is more efficient.

サインインしてコメントする。

Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

24 件のコメント
22 件の古いコメントを表示22 件の古いコメントを非表示

採用された回答

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

24 件のコメント 22 件の古いコメントを表示22 件の古いコメントを非表示

採用された回答

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

その他の回答 (2 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント なしを表示なしを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

24 件のコメント
22 件の古いコメントを表示22 件の古いコメントを非表示

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示