Efficiently Deleting Matrix Columns/Rows

Question

0 投票

Any tips on efficiently and quickly deleting a given row/column from a Matrix?

I had initially believed that deleting the last column of a given matrix would be more efficient than the first column, and all column operations would be more efficient than row operations (given MATLAB's column based memory), which I was able to confirm through testing. However, the performance I did get was rather unfortunate.

someB = rand(4,50000);
someC = someB.';
 tic
 while size(someB,2) > 2
    someB(:,size(someB,2)) = [];
 end
 toc
 tic
 while size(someC,1) > 2
    someC(size(someC,1),:) = [];
 end
 toc
 %Elapsed time is 13.869280 seconds.
 %Elapsed time is 10.198270 seconds.

I did a quick search and in this thread I found hope that through external C MEX functions there may indeed be a way to efficiently delete the last column of a matrix quickly. The code is attached below.

#include "mex.h"
// You may need to uncomment the next line
//#define mwSize int
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{ 
  mwSize n;
  if( n = mxGetN(prhs[0]) )
      mxSetN(prhs[0], n - 1);
}

However, I was not able to get said code running myself. If you take a quick look at the results that the author was finding, you'll find rather remarkable performance. I'm not that good at MEX myself; would anyone know how to fix above code so that it runs, or alternatively, have an equally/near equally good MEX code/MATLAB code performance-wise?

Thanks!

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

James Tursa 2016 年 6 月 15 日

Any operation at the m-file level that causes the number of elements in a variable to change will cause a data copy to take place, which can chew up your performance if the variable is large. The time taken to do this can dominate your run times.

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

James Tursa 2016 年 6 月 15 日

MATLAB Online で開く

2 投票

The above code still works fine for me in R2015a Win32. Maybe you might change the mwSize to size_t, but unless the dimension is really large I don't see how this could make a difference. E.g. on my machine:

>> format debug
>> M = reshape(1:24,4,6)
M =
Structure address = 8a4fa88
m = 4
n = 6
pr = 33126c60
pi = 0
     1     5     9    13    17    21
     2     6    10    14    18    22
     3     7    11    15    19    23
     4     8    12    16    20    24
>> M_copy = M
M_copy =
Structure address = 8a4fd60
m = 4
n = 6
pr = 33126c60
pi = 0
     1     5     9    13    17    21
     2     6    10    14    18    22
     3     7    11    15    19    23
     4     8    12    16    20    24
>> deletelastcolumn(M)
>> M
M =
Structure address = 8a4fa88
m = 4
n = 5
pr = 33126c60
pi = 0
     1     5     9    13    17
     2     6    10    14    18
     3     7    11    15    19
     4     8    12    16    20
>> deletelastcolumn(M)
>> M
M =
Structure address = 8a4fa88
m = 4
n = 4
pr = 33126c60
pi = 0
     1     5     9    13
     2     6    10    14
     3     7    11    15
     4     8    12    16
>> M_copy
M_copy =
Structure address = 8a4fd60
m = 4
n = 6
pr = 33126c60
pi = 0
     1     5     9    13    17    21
     2     6    10    14    18    22
     3     7    11    15    19    23
     4     8    12    16    20    24

So things still seem to be working OK. There are no hacks in this simple code ... just an in-place dimension change. The data pointer stays the same throughout as expected. And the in-place dimension change on M does not affect the dimension of the shared data copy M_copy which has the same data pointer. Again, all is as expected. So I don't see why it would not work on any MATLAB version. What exactly are you seeing?

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

Kyle Marocchini 2016 年 6 月 15 日

MATLAB Online で開く

Glad to have the original author respond!

Unfortunately, I tried replicating your exact code from above; what I found was that after invoking the MEX function, nothing changed on the matrix.

I double checked the steps by deleting the .c file, rebuilding it and recompiling using the MinGW64 C Compiler. It throws a few warnings when compiling, seen below:

deletelastcolumn.c:9:16: warning: passing argument 1 of 'mxSetN_700' discards 'const' qualifier from pointer target type
       mxSetN(prhs[0], n - 1);
   In file included from C:\Program Files\MATLAB\R2016a/extern/include/mex.h:51:0,
                 from C:\Users\Kyle\Documents\MATLAB\deletelastcolumn.c:1:
C:\Program Files\MATLAB\R2016a/extern/include/matrix.h:797:42: note: expected 'struct mxArray *' but argument is of type 'const struct mxArray *'
 LIBMMWMATRIX_PUBLISHED_API_EXTERN_C void mxSetN_700(mxArray *pa, int n);

It could be that the disparity in our results may be as a result of one of these warnings. Alternatively, as you've probably noticed, I'm running a different version of MATLAB than you are (R2016a, 64bit), which might also have a cause.

Would greatly appreciate any insight you might have on the matter.

James Tursa 2016 年 6 月 15 日

編集済み: James Tursa 2016 年 6 月 15 日

MATLAB Online で開く

Here is a semi-hack using the undocumented API function mxCreateSharedDataCopy:

#include "mex.h"
// You may need to uncomment the next line
//#define mwSize int
mxArray *mxCreateSharedDataCopy(const mxArray *); // Undocumented API function
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{ 
  mwSize n;
  plhs[0] = mxCreateSharedDataCopy(prhs[0]);
  if( n = mxGetN(plhs[0]) ) {
      mxSetN(plhs[0], n - 1);
  }
}

This version actually creates and returns an output, so you need to call it differently from the version above. It works with a shared data copy, so no data copying involved. It does have to do some extra work in creating the output mxArray struct and then assigning it at the caller level, so it will run slightly slower than the version above. E.g.,

>> format debug
>> M = reshape(1:24,4,6)
M =
Structure address = 7d9e6a0
m = 4
n = 6
pr = 2df6c600
pi = 0
     1     5     9    13    17    21
     2     6    10    14    18    22
     3     7    11    15    19    23
     4     8    12    16    20    24
>> M = deletelastcolumn_sdc(M)
M =
Structure address = 7d9e6a0
m = 4
n = 5
pr = 2df6c600
pi = 0
     1     5     9    13    17
     2     6    10    14    18
     3     7    11    15    19
     4     8    12    16    20
>> M = deletelastcolumn_sdc(M)
M =
Structure address = 7d9e6a0
m = 4
n = 4
pr = 2df6c600
pi = 0
     1     5     9    13
     2     6    10    14
     3     7    11    15
     4     8    12    16
>> M = deletelastcolumn_sdc(M)
M =
Structure address = 7d9e6a0
m = 4
n = 3
pr = 2df6c600
pi = 0
     1     5     9
     2     6    10
     3     7    11
     4     8    12

So you need to actually assign the result back into the variable.

If this doesn't work, then I am baffled, and may have to resort to a full hack (hacking into the mxArray struct itself to set the N value directly).

Kyle Marocchini 2016 年 6 月 16 日

MATLAB Online で開く

I've got generally good news on my front.

While changing to:

mxSetN((mxArray *)prhs[0], n - 1);

also failed to solve the problem (although it did remove all warnings), your second function (the semi-hack) did indeed do the trick!

I took your comment to heart and went through my old MATLAB versions; I can confirm that the original MEX function works perfectly on R2014b and R2015a, but not on R2015b or R2016a - it seems Mathworks must have put some protection on mxSetN from R2015b onwards. I also went ahead and did a few tests on timing:

 M = rand(3,100000);
 nrows = 3;
 ncols = 30000;
 g = rand(nrows,ncols);
 tic
 g2 = g;
 for j = 1:ncols-1
 g2=deletelastcolumn_sdc(g2);
 end
 toc
 Mslow = g.';
 tic
 for j=1:ncols-1
    Mslow(1,:) = [];
 end
 toc
 ncols = size(M,2);
 tic
 for j=1:ncols-1
    M = deletelastcolumn_sdc(M);
 end
 toc
Elapsed time is 0.019236 seconds.
Elapsed time is 2.089576 seconds.
Elapsed time is 0.061350 seconds.

Compared to your original 2008 post on the subject, we see that the newer, more expensive MEX function for R2016a actually works at the same speed as the original, which I would like to believe is due to significant internal improvements in MATLAB - also explaining the massive increase from 14s to 2s. Of course, given that the original post was in 2008, this increase seems less significant (it's been what, 8 years?)...nevertheless, something to note. The function also scales well, as demonstrated in the last post.

Cutting to the short; while MATLAB has no doubt improved, it seems you can't still beat MEX! Moreover, the fact that Mathworks has set internal protections on mxSetN may indicate that the current deletelastcolumn_sdc function will eventually become outdated as well. Nevertheless, it works on R2016a, which is good enough for me.

Thank you so much for the help James!

James Tursa 2016 年 6 月 16 日

Glad to help. FYI, the reason the sdc version gets around the apparent internal protections on mxSetN is because the shared data copy we created is of type TEMPORARY (created inside the mex routine) instead of type NORMAL (incoming from the argument list). This variable type is a field of the mxArray itself, so maybe mxSetN (and friends like mxSetM) simply check that field and return without doing anything if the variable type is NORMAL. This is just a guess on my part. The undocumented API function mxCreateSharedDataCopy has been around for a long time. TMW tried to get rid of it a couple of years ago but got inundated with complaints from mex programmers who used it, so they put it back in the library and hopefully will keep it there in the future. It is an extremely useful function for advanced mex programmers who know what they are doing.

Jan 2019 年 4 月 2 日

MATLAB Online で開く

Some years later: It is confusing, that the MEX function to remove the last column is compared with some code, which deletes the first row:

 for j=1:ncols-1
    Mslow(1,:) = [];
 end

Processing rows is much more expensive.

サインインしてコメントする。

Answer 2

Gianluca Tabella 2019 年 3 月 26 日

編集済み: Gianluca Tabella 2019 年 3 月 26 日

MATLAB Online で開く

0 投票

I experienced the same disappointment meanwhile running a while-loop on a large matrix (10000x8000) which, in the end, should have been around 8x8000 after removing some specific rows.

What I did was this:

Q=[1 1 1 2 3 4 3 3 2 1 1 2 3 4 4 5 6 6 6 6 3 3 2 3 4];
no_thr=length(Q);
big_matrix=rand(no_thr);
i=2; 
j=2;
big_matrix_new(1,:)=big_matrix(1,:); % this row is necessary since the counters start from "2" becuase the condition compare an i_th element with the i-1_the element so the first row of the matrix would not be considered
while i<=no_thr
    if Q(i)==Q(i-1) % a certain condition that, if true, removes an element of an array
        Q(i)=[];
        no_thr=length(Q);
    else    % the previous condition, if true, should remove also the i_th row of a matrix.
            % what I did instead is, if the condition is false, to copy that row in a new matrix.
        big_matrix_new(j,:)=big_matrix(i,:);
        j=j+1;
        i=i+1;
    end
end
big_matrix=big_matrix_new; % the just created matrix containing only the wished rows can be copied so that the old name can be used 
clearvars final_result_new % we can clear the matrix created in the loop so that the only matrix remaining is the short matrix having the original name 

Briefly, what I did was, instead of removing the unwanted rows, copying the wanted rows in a just created matrix that substitutes the original matrix.

This method made my script usable.

2 件のコメント
なしを表示なしを非表示

Steven Lord 2019 年 3 月 26 日

MATLAB Online で開く

If possible, try identifying and recording the rows to be included or deleted inside the loop then actually change what's in the loop afterwards.

M = magic(20)
rowsToKeep = false(size(M, 1), 1);
for therow = 1:size(M, 1)
    if M(therow, 1) > 200
        rowsToKeep(therow) = true;
    end
end
M2 = M(rowsToKeep, :) % If you don't need M after this line,
% M = M(rowsToKeep, :) % you can use this instead

M2 contains only those rows from M whose first element is greater than 200. Alternately:

M = magic(20);
rowsToDiscard = true(size(M, 1), 1);
for therow = 1:size(M, 1)
    if M(therow, 1) > 200
        rowsToDiscard(therow) = false;
    end
end
M3 = M; % Keep M around in case you want to use it later to check
M3(rowsToDiscard, :) = [] % If you don't need to keep M around,
% M(rowsToDiscard, :) = [] % you can use this instead

Check:

isequal(M2, M3) % true

Jan 2019 年 4 月 2 日

編集済み: Jan 2019 年 4 月 2 日

MATLAB Online で開く

@Gianluca Tabella: It looks, like your code is overly complex. Wouldn't this do the same:

index  = 1 + sum(diff(Q)~=0);
result = big_matrix(1:index, :);

This would avoid the iterative growing or shrinking of the arrays, which is a standard problem for the performance.

By the way, are you sure that this code does what you want? It counts, how many time neighboring elements of Q have different values and copies one more columns from the input to the output. My first guess was, that you want:

index  = [true, diff(Q)~=0];
result = big_matrix(index, :);

to copy the rows from the input, at which the value of Q changes.

サインインしてコメントする。

Efficiently Deleting Matrix Columns/Rows

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

採用された回答

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

その他の回答 (1 件)

2 件のコメント
なしを表示なしを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

Efficiently Deleting Matrix Columns/Rows

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

採用された回答

6 件のコメント 4 件の古いコメントを表示 4 件の古いコメントを非表示

その他の回答 (1 件)

2 件のコメント なしを表示 なしを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示