How can I perform large sparse matrix multiplication efficiently when one sparse matrix is block diagonal?

16 ビュー (過去 30 日間)
Background:
I have a 3d matrix V of size [J,I,K], where J and I are large, say J=1e5 and I=1e4. I would like to get two matrices:
V_ik = sum(V,1);
V_jk = sum(V,2);
The 3d matrix V is actually very sparse (around 90% sparsity), but the full [J,I,K] matrix takes up too much memory usage.
To deal with this problem, I code V into a 2d sparse matrix V_sp = [V_1;V_2;...;V_I], where each block matrix V_i is a matrix of size [J,K]. The problem arises when I calculate V_ik and V_jk. From the view of matrix multiplication, I write the following code:
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
However, the sparse matrix kron(speye(I),ones(1,J)) is of size [I,J*I], and repmat(speye(J),1,I) is of size [J,J*I]. Either of them takes up 22.4G storage for J=1e5 and I=1e4.
My Question:
Is there an efficient way to perform this kind of matrix multiplication without too much memory usage? I tried using a for-loop to sum along the desired axis, the code with for-loop worked at the cost of around 5x more running time.
If there's no such efficient solution, can I code the 3d sparse matrix V in some other smart format such that I can reach a balance between the memory usage and running time? (I found ndSparse class that could store n-dimensional sparse arrays, but the class will store it internally as an ordinary 2D sparse array, and the sum function applied to it is not that fast.)
Thanks in advance for your help!
  7 件のコメント
Bruno Luong
Bruno Luong 2021 年 10 月 28 日
The blockdiag idea seems to work well
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
Elapsed time is 7.864562 seconds.
% James's method using V_blkdiag
tic
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
Elapsed time is 0.292978 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.402071 seconds.
Wenyu Zhang
Wenyu Zhang 2021 年 10 月 28 日
Thank you for comparison @Bruno Luong!
Although the blockdiag idea seems to work well, if what I have at hand is the non-zero values and the subindex [i,j,k], I don't need to go far to create a sparse diagonal matrix first, so the accumarray is faster. Perhaps the storage of my data is what I should reconsider as the highest priority.
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% James's method using V_blkdiag
tic
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
Elapsed time is 11.238796 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.329572 seconds.

サインインしてコメントする。

採用された回答

Bruno Luong
Bruno Luong 2021 年 10 月 27 日
編集済み: Bruno Luong 2021 年 10 月 28 日
You might rethink of the storage of your data, such as this (tic/toc result obtained from run on TMW online server):
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
Elapsed time is 7.273694 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.346202 seconds.
  1 件のコメント
Wenyu Zhang
Wenyu Zhang 2021 年 10 月 29 日
Thank you so much! After comparing the running time of several methods, this works the best.

サインインしてコメントする。

その他の回答 (3 件)

Matt J
Matt J 2021 年 10 月 27 日
編集済み: Matt J 2021 年 10 月 27 日
With ndSparse, did you try the summl() method? It's more memory intensive, but it can also be faster. Here's what I see in a simple test comparing ndSparse.sum() and ndSparse.summl().
>> A=ndSparse.sprand([1e4,1e4,10],0.1);
>> tic; sum(A,1);toc
Elapsed time is 10.697812 seconds.
>> tic; summl(A,1);toc
Elapsed time is 0.927524 seconds.
  2 件のコメント
Wenyu Zhang
Wenyu Zhang 2021 年 10 月 28 日
Thank you so much. I haven't tried this before. I will try it and see if it improves significantly in my own matrix.
Wenyu Zhang
Wenyu Zhang 2021 年 10 月 29 日
I have tried this and found that summl is significantly faster than sum. However, it takes some time to build the ndSparse class. Anyway, thanks for your help!

サインインしてコメントする。


Bjorn Gustavsson
Bjorn Gustavsson 2021 年 10 月 27 日
Shouldn't straight summation be preferable for the calculations of sums rather than doing it by way of matrix multiplication. Perhaps something like this might work:
v_ik = zeros(I,K);
v_jk = zeros(J,K);
for i1 = 1:I,
v_ik(i1,:) = sum(v_sp((1+J*(i1-1)):(J*i1),:));
end
for j1 = 1:J,
v_jk(j1,:) = sum(v_sp(j1:J:end,:));
end
HTH
  1 件のコメント
Wenyu Zhang
Wenyu Zhang 2021 年 10 月 27 日
The code with for-loop worked indeed, but at the cost of around 5x more running time.
I think a possible reason is that extracting submatrices from a sparse matrix takes time.

サインインしてコメントする。


Matt J
Matt J 2021 年 10 月 28 日
編集済み: Matt J 2021 年 10 月 28 日
The typical value of K is 50.
Since the third dimension is so small, it's probably worthwhile just storing the data as a length-K cell array of JxI matrices. You can then just do the summations with,
cellfun(@(v) sum(v,1), V,'uni',0);
cellfun(@(v) sum(v,2), V,'uni',0);
  1 件のコメント
Wenyu Zhang
Wenyu Zhang 2021 年 10 月 29 日
Actually my V consists of J*I length-K vectors, each generated from a multinomial distribution using mnrnd. It may bring overhead to reshape the data into a length-K cell array of J*I. Thanks for your suggestion!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeMatrix Indexing についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by