現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
How to sum parts of a matrix of different sizes, without using for loops?
3 ビュー (過去 30 日間)
古いコメントを表示
ER2018
2018 年 6 月 27 日
I have a relatively large matrix NxN (N~20,000) and a Nx1 vector identifying the indices that must be grouped together. I want to sum together parts of the matrix, which in principle can have a different number of elements and non-adjacent elements. I quickly wrote a double for-loop that works correctly but of course it is inefficient. The profiler identified these loops as one of the bottlenecks in my code. I tried to find a smart vectorization method to solve the problem. I explored the arrayfun, cellfun, bsxfun functions, and looked for solutions to similar problems,... but I haven't found a final solution yet. I'll be grateful for any help! ER
This is the test code with the two for-loops:
M=rand(10); % test matrix
idxM=[1 2 2 3 4 4 4 1 4 2]; % each element indicates to which group each row/column of M belongs
nT=max(idxM);
sumM=zeros(nT,nT);
for t1=1:nT
for t2=1:nT
sumM(t1,t2)=sum(sum(M(idxM==t1,idxM==t2)));
end
end
PS: Long-term reader, first-time poster.
5 件のコメント
Ameer Hamza
2018 年 6 月 27 日
Is the line correct
sumM(t1,t2)=sum(sum(M(idxM==t1,idxM==2)));
or it have idxM==t2. If it is correct then inner for loop is useless.
採用された回答
Matt J
2018 年 6 月 27 日
編集済み: Matt J
2018 年 6 月 27 日
S=sparse(1:N,idxM,1);
sumM=S.'*(M*S);
13 件のコメント
ER2018
2018 年 6 月 27 日
編集済み: ER2018
2018 年 6 月 27 日
This works nicely! I tested it for values of N in the range 1,000 to 20,000, and for values of nT in the range 10 to 1,000.
With respect to the initial for-loops:
-for nT=10, it runs 3-6 times faster
-for nT=100, it runs ~10 times faster
-for nT=1,000, it runs ~300 times faster (!)
...while using only 5-10% more RAM.
ER2018
2018 年 6 月 27 日
Sure, I just wanted to test it a bit to provide a more complete feedback. I can also confirm that the solution worked when integrated in the real cases. Thanks a lot!
Now... To complete the "lesson", would you like to explain a bit what the second line of the code does? It remains a bit obscure to me.
sumM=S.'*(M*S);
Matt J
2018 年 6 月 28 日
編集済み: Matt J
2018 年 6 月 28 日
If you study the structure of S in small examples, you should be able to see that A=M*S has elements
A(k,j)= sum( M(k, idxM==j) ,2);
Similarly, B=S.'*A does the same kind of summations along the columns of A,
B(i,j)=sum( A(idxM==i,j) ,1)
Combining these operations leads to
B(t1,t2) = sum( A(idxM==t1,t2) ,1)
= sum( sum( M(idxM==t1, idxM==j) ,2) ,1)
which is the desired result with sumM=B.
ER2018
2018 年 6 月 28 日
Clear now. So basically one product sums along the columns, the other sums along the rows.
I'm trying to apply this elegant sparse-matrix and product approach to other problems. For example, 1D sum would only need one product. 1D average would need one product and one scalar division (by a vector describing the number of elements in each summed group).
What about other generic functions, not requiring sums? Can we profit from this method to apply a function @fun to parts of a matrix or vector?
Matt J
2018 年 6 月 28 日
編集済み: Matt J
2018 年 6 月 28 日
The expression S.'*M*S is linear in M, so you couldn't use it to do anything non-linear to M just by selection of a different S. However, you could do things like
exp(S.'*log(M)*S)
This would be the same as taking products of all the group combinations instead of sums.
Also, in my comment here, I pointed out how you could make the for-loops more efficient. You could adapt this to other separable reduction operations, e .g., maximization instead of summation:
for t2=1:nT
partials=max( M(:,idxM==t2) ,[],2);
for t1=1:nT
maxM(t1,t2)=max(partials(idxM==t1));
end
end
FInally, in the special case when all the sub-matrices are equal-sized tiles, there are lots of separable reductions that you can do in a fully vectorized way, see SEPBLOCKFUN.
ER2018
2018 年 6 月 29 日
I'll answer from the bottom.
The sub-matrices are not necessarily equal-sized, so I didn't look into sepblockfun.
I implemented your improved version of for-loops, and it does improve the speed, but only by 10-20%.
My preferred solution remains the first you proposed. And the example you presented with exp and log functions acting in combination with the sparse matrix S is really cool!
Matt J
2018 年 6 月 29 日
編集済み: Matt J
2018 年 6 月 29 日
I implemented your improved version of for-loops, and it does improve the speed, but only by 10-20%.
You have a weird computer. I generally see at least a factor of 2 speed-up, as in the tests below. I admit I expected more speed, but the main point was that for other kinds of operations beside summation, it gives you a bit more flexibility.
N=8000;
nT=400;
M=rand(N); % test matrix
idxM=randi(nT,1,N);
sumM=zeros(nT,nT);
tic
for t1=1:nT
for t2=1:nT
sumM(t1,t2)=sum(sum(M(idxM==t1,idxM==t2)));
end
end
toc
%Elapsed time is 6.343936 seconds.
tic;
for t2=1:nT
partials=sum( M(:,idxM==t2) ,2);
for t1=1:nT
sumM(t1,t2)=sum(partials(idxM==t1)); %1D sums
end
end
toc;
%Elapsed time is 3.067782 seconds.
Matt J
2018 年 6 月 29 日
編集済み: Matt J
2018 年 6 月 29 日
You have a weird computer. I generally see at least a factor of 2 speed-up, as in the tests below. I admit I expected more speed, but the main point was that for other kinds of operations beside summation, it gives you a bit more flexibility.
Here's an application of the same test to maximization, as opposed to summation. I optimized the second approach a bit more, and see a factor of 10 speed-up:
N=8000;
nT=400;
M=rand(N);
idxM=randi(nT,N,1);
maxM=zeros(nT,nT);
tic
for t1=1:nT
for t2=1:nT
maxM(t1,t2)=max(max(M(idxM==t1,idxM==t2)));
end
end
toc
%Elapsed time is 6.505527 seconds.
tic;
S=sparse(1:N,idxM,true);
for t2=1:nT
partials=max(M(:,S(:,t2)),[],2);
for t1=1:nT
maxM(t1,t2)=max(partials(S(:,t1))); %1D maxs
end
end
toc;
%Elapsed time is 0.585687 seconds.
ER2018
2018 年 6 月 29 日
I'm not sure if it affects the performance, but you might want to reinitialize maxM in the second test.
maxM=zeros(nT,nT);
That said, I do see the value of this alternative approach. As you said, it makes the approach more flexible when you don't need to sum.
ER2018
2018 年 6 月 29 日
編集済み: ER2018
2018 年 6 月 29 日
I'm finally in front of a PC and I ran a few tests.
Long story short: Calculating once the indices of the elements in each group before entering the for-loops has also a very significant impact on the code speed. With this improvement, even the solution with two simple for-loops performs surprisingly well. Pre-calculating the indices also improves the performance of the "improved for-loops". But for summing your very first two-liner remains the best by far!
For the case of the sum:
%%Test - Sum
N=8000;
nT=400;
M=rand(N); % test matrix
idxM=randi(nT,1,N);
sumM0=zeros(nT,nT);
sumM1=zeros(nT,nT);
sumM2=zeros(nT,nT);
sumM3=zeros(nT,nT);
sumM4=zeros(nT,nT);
% Test 0 - Reference
tic
for t1=1:nT
for t2=1:nT
sumM0(t1,t2)=sum(sum(M(idxM==t1,idxM==t2)));
end
end
toc
% Elapsed time is 10.232443 seconds.
% Code 1 - Test
tic;
for t2=1:nT
partials=sum( M(:,idxM==t2) ,2);
for t1=1:nT
sumM1(t1,t2)=sum(partials(idxM==t1)); %1D sums
end
end
toc;
% Elapsed time is 5.098394 seconds.
isequal(sumM0,sumM1)
% 0
% => Solution matrix not identical to the reference
mean(mean(abs(sumM0-sumM1)))
% Average error: 2.8107e-14
% Code 2 - Test
tic
% Calculate once the indices
idxT=cell(nT,1);
for t=1:nT
idxT{t}=find(idxM==t);
end
for t1=1:nT
for t2=1:nT
sumM2(t1,t2)=sum(sum(M(idxT{t1},idxT{t2})));
end
end
toc
% Elapsed time is 1.512766 seconds.
isequal(sumM0,sumM2)
% 1
% => Solution matrix identical to the reference
% Code 3 - Test
tic;
% Calculate once the indices
idxT=cell(nT,1);
for t=1:nT
idxT{t}=find(idxM==t);
end
for t2=1:nT
partials=sum( M(:,idxT{t2}) ,2);
for t1=1:nT
sumM3(t1,t2)=sum(partials(idxT{t1})); %1D sums
end
end
toc;
% Elapsed time is 0.487134 seconds.
isequal(sumM0,sumM3)
% 0
% => Solution matrix not identical to the reference
mean(mean(abs(sumM0-sumM3)))
% Average error: 2.8107e-14
% Code 4 - Test
tic;
S=sparse(1:N,idxM,1);
sumM4=S.'*(M*S);
toc;
% Elapsed time is 0.070906 seconds.
isequal(sumM0,sumM4)
% 0
% => Solution matrix not identical to the reference
mean(mean(abs(sumM0-sumM4)))
% Average error: 2.8107e-14
Reference: Not optimized double for-loop.
Test 1 (with improved for-loops) is ~2 times faster.
Test 2 (with basic for-loops using previously calculated indices) is ~7 times faster.
Test 3 (with improved for-loops using previously calculated indices) is ~21 times faster.
Test 4 (with matrix products using the sparse matrix with indices) is ~140 times faster.
Tests 1, 3, 4: the resulting matrix is almost identical to the reference, but there is a small difference (on average ~10E-14). Test 2: the resulting matrix is identical to the reference.
For the case of the max:
%%Test - Max
N=8000;
nT=400;
M=rand(N); % test matrix
idxM=randi(nT,1,N);
maxM0=zeros(nT,nT);
maxM1=zeros(nT,nT);
maxM2=zeros(nT,nT);
maxM3=zeros(nT,nT);
maxM4=zeros(nT,nT);
% Test 0 - Reference
tic
for t1=1:nT
for t2=1:nT
maxM0(t1,t2)=max(max(M(idxM==t1,idxM==t2)));
end
end
toc
% Elapsed time is 10.302640 seconds.
% Code 1 - Test
tic;
for t2=1:nT
partials=max( M(:,idxM==t2) ,[],2);
for t1=1:nT
maxM1(t1,t2)=max(partials(idxM==t1));
end
end
toc;
% Elapsed time is 4.789534 seconds.
isequal(maxM0,maxM1)
% 1
% => Solution matrix identical to the reference
% Code 2 - Test
tic;
S=sparse(1:N,idxM,true);
for t2=1:nT
partials=max(M(:,S(:,t2)),[],2);
for t1=1:nT
maxM2(t1,t2)=max(partials(S(:,t1))); %1D maxs
end
end
toc;
% Elapsed time is 0.615749 seconds.
isequal(maxM0,maxM2)
% 1
% => Solution matrix identical to the reference
% Code 3 - Test
tic
% Calculate once the indices
idxT=cell(nT,1);
for t=1:nT
idxT{t}=find(idxM==t);
end
for t1=1:nT
for t2=1:nT
maxM3(t1,t2)=max(max(M(idxT{t1},idxT{t2})));
end
end
toc
% Elapsed time is 1.639348 seconds.
isequal(maxM0,maxM3)
% 1
% => Solution matrix identical to the reference
% Code 4 - Test
tic;
% Calculate once the indices
idxT=cell(nT,1);
for t=1:nT
idxT{t}=find(idxM==t);
end
for t2=1:nT
partials=max( M(:,idxT{t2}) ,[],2);
for t1=1:nT
maxM4(t1,t2)=max(partials(idxT{t1}));
end
end
toc;
% Elapsed time is 0.569280 seconds.
isequal(maxM0,maxM4)
% 1
% => Solution matrix identical to the reference
Reference: Not optimized double for-loop.
Test 1 (with improved for-loops) is ~2 times faster.
Test 2 (with improved for-loops, using the sparse matrix of indices) is ~17 times faster.
Test 3 (with basic for-loops using previously calculated indices) is ~6 times faster.
Test 4 (with improved for-loops using previously calculated indices) is ~18 times faster.
For Tests 1-4, the resulting matrix is always identical to the reference.
ER2018
2018 年 6 月 29 日
I agree... the first time I tried it it seemed like magic. And this is why I was interested in trying the approach with other functions.
その他の回答 (1 件)
Matt J
2018 年 6 月 29 日
編集済み: Matt J
2018 年 6 月 29 日
Here's another method. It isn't as fast as the sparse matrix approach, but it is loop-free and adaptable to other operations (min, max, etc...)
tmp=splitapply(@(z)sum(z,2),M,idxM.');
sumM=splitapply(@(z)sum(z,1),tmp,idxM);
11 件のコメント
ER2018
2018 年 6 月 29 日
Running this code with the max function gives me an error:
tmp=splitapply(@(z)max(z,2),M,idxM.');
Error using splitapply (line 130) The function '@(z)max(z,2)' returned a non-scalar value when applied to the 1st group of data.
To compute nonscalar values for each group, create an anonymous function to return each value in a scalar cell:
@(z){max(z,2)}
---
I then tried to define an anomymous function, but I must have done something wrong since I get another error:
anonymFun=@(z){max(z,2)};
tmp=splitapply(@(z)anonymFun,M,idxM.');
Error using horzcat Nonscalar arrays of function handles are not allowed; use cell arrays instead.
Error in splitapply>localapply (line 228) finalOut{j} = horzcat(funOut{j,:});
Error in splitapply (line 130) varargout = localapply(fun,splitdata,gdim,nargout);
ER2018
2018 年 6 月 29 日
The speed changes considerably for different nT.
T1: Two basic for-loops
T2: Two optimized for-loops, with sparse matrix of indices
T3: splitapply-based
N=20,000, nT= 4, T1/T2/T3: 2.2/1.6/4.4 s
N=20,000, nT= 40, T1/T2/T3: 4.4/1.7/4.8 s
N=20,000, nT=400, T1/T2/T3: 27.0/2.1/5.8 s
T2 solution is always fastest though.
N=20000;
nT=400;
M=rand(N);
idxM=randi(nT,N,1);
maxM1=zeros(nT,nT);
maxM2=zeros(nT,nT);
tic
for t1=1:nT
for t2=1:nT
maxM1(t1,t2)=max(max(M(idxM==t1,idxM==t2)));
end
end
toc
tic;
S=sparse(1:N,idxM,true);
for t2=1:nT
partials=max(M(:,S(:,t2)),[],2);
for t1=1:nT
maxM2(t1,t2)=max(partials(S(:,t1))); %1D maxs
end
end
toc;
isequal(maxM1,maxM2)
tic;
tmp=splitapply(@(z)max(z,[],2),M,idxM.');
maxM3=splitapply(@(z)max(z,[],1),tmp,idxM);
toc;
isequal(maxM1,maxM3)
ER2018
2018 年 6 月 29 日
I ran another test for another application - summing the logarithm of the elements of each group.
The sparse-matrix double-product remains the fastest, especially for large nT (try 400...).
N=20000;
nT=40;
M=rand(N);
idxM=randi(nT,N,1);
% Code 0 - Reference
tic
logM1=zeros(nT,nT);
for t1=1:nT
for t2=1:nT
logM1(t1,t2)=sum(sum(log(M(idxM==t1,idxM==t2))));
end
end
toc
% Elapsed time is 8.881048 seconds.
% Code 2 - Test
tic;
logM2=zeros(nT,nT);
% Calculate once the indices
idxT=cell(nT,1);
for t=1:nT
idxT{t}=find(idxM==t);
end
tempM=log(M);
for t2=1:nT
partials=sum( tempM(:,idxT{t2}) ,2);
for t1=1:nT
logM2(t1,t2)=sum(partials(idxT{t1})); %1D sums
end
end
toc;
isequal(logM1,logM2)
% 0 --> not identical
mean(mean(abs((logM1-logM2)/logM1)))
% Elapsed time is 5.128909 seconds.
% Code 4 - Test
tic;
S=sparse(1:N,idxM,true);
logM4=S.'*log(M)*S;
toc;
isequal(logM1,logM4)
% Elapsed time is 4.526294 seconds.
I wasn't able to make the splitapply version working. Honestly I haven't studied the function yet.
% Code 3 - Test
% NOT WORKING
tic;
tmp=splitapply(@(z)log(z),M,idxM.');
logM3=splitapply(@(z)log(z),tmp,idxM);
toc;
isequal(logM1,logM3)
Matt J
2018 年 6 月 29 日
T2 solution is always fastest though.
I don't think that's to be trusted.. With N=20000, nT=1000, I get T2=3.666187, T3=2.886430.
ER2018
2018 年 6 月 29 日
It seems you have a faster PC. Does it mean that you run T2 for N=20000, nT=400 in much less than 2.1 s?
ER2018
2018 年 6 月 29 日
We're talking about the case with log, right? It's strange you run T2 slower than me, T3 faster than me.
By the way.. I'm using Matlab 2015b, and I have 32 Gb of RAM.
Matt J
2018 年 6 月 29 日
We're talking about the case with log, right?
No, when you wrote
N=20,000, nT=400, T1/T2/T3: 27.0/2.1/5.8 s
I assumed you were talking about the max() case, since your code for that appears right afterward.
Matt J
2018 年 6 月 29 日
編集済み: Matt J
2018 年 6 月 29 日
My fastest version of sum(log), and also the most RAM-efficient, is as follows,
% Code 3 - Test
% NOT WORKING
tic;
tmp=splitapply(@(z)log(prod(z,2)),M,idxM.');
logM3=splitapply(@(z)sum(z,1),tmp,idxM);
T3=toc
It's potentially dangerous, because depending on nT and the magnitudes of M(i,j) the prod could overflow. However, with N=2e4,nT=400, and M=rand(N) this does not happen and I get T3=2.54 sec.
参考
カテゴリ
Help Center および File Exchange で Performance and Memory についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
アジア太平洋地域
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)