Why sparse function is slow?

I recently generated a sparse matrix using function: sparse. When I do the profiling, I found the vast majority of the runtime is spent on calling function sparse, which is pretty shocking to me.
To find out if generating a sparse matrix is slow across all the programming languages. I use scipy.sparse.coo_matrix in python to perform the same task. What suprised me is that scipy.sparse.coo_matrix has 10X speed of that of Matlab's sparse function.
Matlab demo Code:
RowInd = repmat(randperm(262144),81,1);
RowInd = RowInd(1:260100*81) ;
ColInd = repmat(randperm(262144),81,1);
ColInd = ColInd(1:260100*81);
Val = randn(260100*81,1);
tStart = tic;
L=sparse(RowInd,ColInd,Val, 262144, 262144 ,260100*81);
tEnd = toc(tStart);
disp(['Runtime of generating a sparse matrix in Matlab:', num2str(tEnd), ' second.']);
Python demo Code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
from time import time
if __name__ == "__main__":
nz_indsRow = np.tile(np.random.permutation(262144), 81)
nz_indsRow = nz_indsRow[:260100*81]
nz_indsCol = np.tile(np.random.permutation(262144), 81)
nz_indsCol = nz_indsCol[:260100*81]
nz_indsVal = np.random.rand(260100*81)
print(nz_indsRow.shape, nz_indsCol.shape, nz_indsVal.shape)
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')
In my desktop: the runtime is 1.2399 s vs 0.12721 s.
Can someone explain to me that why sparse function in Matlab is so slow? How to find a more efficient function that generate a sparse matrix in Matlab?

15 件のコメント

Bruno Luong
Bruno Luong 2020 年 9 月 21 日
編集済み: Bruno Luong 2020 年 9 月 21 日
I didn't read your code (I don't checkout unknown link) but do you happen to call sparse within a loop?
If yes, then you do a bad building workflow. You should build I, J, S arrays in the loop (with preallocation) then call SPARSE once, when I, J, K are ready.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
No, I called sparse function only once after pre-computing row index, col index and all the non-zero entries.
Bruno Luong
Bruno Luong 2020 年 9 月 21 日
編集済み: Bruno Luong 2020 年 9 月 21 日
Please post you data (Matfile) that contains the input parameters of sparse() command.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
The files for reproducing the results are found in the attachment.
Bruno Luong
Bruno Luong 2020 年 9 月 21 日
編集済み: Bruno Luong 2020 年 9 月 21 日
I don't have im processing toolbox so I can't run your code.
I need you to save
RowInd, ColInd, Vals, NumPixels, wins_number, WinCardinality in matfile and attached here.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
Hi, Bruno. Please check out the image.mat file as the image variable. I cannot upload Vals.mat because it is way larger than 5 MB.
Bruno Luong
Bruno Luong 2020 年 9 月 21 日
Forget it, useless for me.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
Are you able to regenerate the results? If not, how can I help? The website only allows me to upload 10 files per day, each of which shall be no larger than 5MB.
Bruno Luong
Bruno Luong 2020 年 9 月 21 日
編集済み: Bruno Luong 2020 年 9 月 21 日
No I don't have a IP toolbox (I can't run the im2col command).
But I guess you generate sparse matrix of size (262144 x 262144) with 21068100 non-zeros elements.
I generate a random sparse matrix with similar input sizes it takes 0.72605 second on my PC. How much you get?
EDIT: just see you post the time in the question.
Bruno Luong
Bruno Luong 2020 年 9 月 21 日
I don't see anything wrong with your MATLAB sparse command, so it seems that python is much more efficient in building sparse matrix than MATLAB. Though a factor of 10 is huge.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
編集済み: Lantao Yu 2020 年 9 月 21 日
Let me cut it short:
Matlab code:
RowInd = repmat(randperm(262144),81,1);
RowInd = RowInd(1:260100*81) ;
ColInd = repmat(randperm(262144),81,1);
ColInd = ColInd(1:260100*81);
Val = randn(260100*81,1);
tStart = tic;
L=sparse(RowInd,ColInd,Val, 262144, 262144 ,260100*81);
tEnd = toc(tStart);
disp(['Runtime of generating a sparse matrix in Matlab:', num2str(tEnd), ' second.']);
Python Code:
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
from time import time
if __name__ == "__main__":
nz_indsRow = np.tile(np.random.permutation(262144), 81)
nz_indsRow = nz_indsRow[:260100*81]
nz_indsCol = np.tile(np.random.permutation(262144), 81)
nz_indsCol = nz_indsCol[:260100*81]
nz_indsVal = np.random.rand(260100*81)
print(nz_indsRow.shape, nz_indsCol.shape, nz_indsVal.shape)
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')
In my desktop: the runtime is 1.2399 s vs 0.12721 s.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
編集済み: Lantao Yu 2020 年 9 月 21 日
It seems sarcastic that a paid programming language is running at 1/10 speed of a free programming language.
the cyclist
the cyclist 2020 年 9 月 21 日
I'm not familiar with the COO format, but I'm wary of the fact (stated in this documentation) that one cannot do arithmetic operations directly on it. One has to convert to CSR or CSC format first.
It seems possible to me that this is not a completely fair comparison, as a result. But I really don't know.
What I do know is that cherry-picking one speed test, and then saying that a paid language is "running at 1/10 the speed" is definitely not a particularly useful exercise. Python has many strengths, but I wouldn't base the choice on this one excruciatingly small detail (unless of course that is the single dominant factor for you, for some reason).
Bruno Luong
Bruno Luong 2020 年 9 月 21 日
Good point cyclist. For fair comparison, one must run CSC, whih is MATLAB format.
Lantao Yu
Lantao Yu 2020 年 9 月 21 日
Thank you for your point, cyclist.
I run the following code involving convert COO matrix to CSC/CSR matrix. The print goes:
Runtime of generating a sparse CSC matrix via SicPy: 1.3742189407348633 second.
Runtime of generating a sparse CSR matrix via SicPy: 1.3034861087799072 second.
Now the runtime is close to that in Matlab. I apologize for not conducting a fair comparison.
import numpy as np
import scipy.sparse
import scipy.sparse.linalg
from time import time
if __name__ == "__main__":
nz_indsRow = np.tile(np.random.permutation(262144), 81)
nz_indsRow = nz_indsRow[:260100*81]
nz_indsCol = np.tile(np.random.permutation(262144), 81)
nz_indsCol = nz_indsCol[:260100*81]
nz_indsVal = np.random.rand(260100*81)
print(nz_indsRow.shape, nz_indsCol.shape, nz_indsVal.shape)
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
LL = scipy.sparse.coo_matrix.tocsc(L)
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')
t0 = time()
L = scipy.sparse.coo_matrix(
(nz_indsVal, (nz_indsRow, nz_indsCol)), shape=(262144, 262144))
LL = scipy.sparse.coo_matrix.tocsr(L)
t1 = time()
print('Runtime of generating a sparse matrix via SicPy:', t1-t0, 'second.')

サインインしてコメントする。

回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeCreating and Concatenating Matrices についてさらに検索

タグ

質問済み:

2020 年 9 月 21 日

コメント済み:

2020 年 9 月 21 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by