Group means- Large Data

I am trying to get grouped means similar to the following example. I have only know of the following two methods. I have a large data set and have to repeat this command a large number of times so I am concerned about speed. Are there any quicker methods? Thanks for any help!
x=repmat(1:10,1,100)';
x(:,2:100)=rand(1000,99);
%Method 1: Groupstats
tic
meantest=grpstats(x(:,2:100),x(:,1));
toc
%Method 2: Logical Indexing
meantest2=zeros(10,99);
tic
for i=1:10
g=x(:,1)==i;
meantest2(i,:)=mean(x(g,2:end));
end
toc

1 件のコメント

John D'Errico
John D'Errico 2015 年 9 月 29 日
This is not even remotely a "large" data set.

サインインしてコメントする。

回答 (1 件)

John D'Errico
John D'Errico 2015 年 9 月 29 日
編集済み: John D'Errico 2015 年 9 月 29 日

0 投票

On my cpu, here were the times reported for your two solutions.
Elapsed time is 0.005852 seconds.
Elapsed time is 0.004096 seconds.
So I tried consolidator (from the file exchange.)
tic
[~,meantest3] = consolidator(x(:,1),x(:,2:100),@mean);
toc
Elapsed time is 0.002943 seconds.
It has been around for a while, but still pretty fast.

3 件のコメント

Bus141
Bus141 2015 年 9 月 29 日
Thanks for the help. I had never heard of consolidator before. I just ran it and it seems to be slower.
I just figured out another way that is slightly faster than the previous two by pre-multiplying the matrix by a column vector that corresponds to the sample size. The stats on this simple small data set show much greater speed but when I convert it to my larger problem, I only increase speed by 7% over the grpstats method.
meantest3=zeros(10,99);
I=ones(1,100)/100;
tic
for i=1:10
g=x(:,1)==i;
meantest3(i,:)=I*x(g,2:end);
end
toc
John D'Errico
John D'Errico 2015 年 9 月 29 日
編集済み: John D'Errico 2015 年 9 月 29 日
Sigh. What do you mean, "it seems to be slower"?
What version of MATLAB are you running? Did you actually warm it up? Since you have not actually reported ANY times, nor told us any useful information, all I can say is you "seem" to be wrong.
tic,[~,M] = consolidator(x(:,1),x(:,2:100),@mean);toc
Elapsed time is 0.004237 seconds.
tic,[~,M] = consolidator(x(:,1),x(:,2:100),@mean);toc
Elapsed time is 0.001989 seconds.
See that the second time it is called, consolidator runs faster. This is because the first time any function is called, it will run slowly, since MATLAB must cache the function. This is called warming it up.
A better test is to use timeit of course, but the difference is clear here. Note the variation in times computed is actually pretty large. timeit will reduce the variance in that estimate. tic and toc are actually terrible ways to time-test code.
David J. Mack
David J. Mack 2015 年 12 月 4 日
編集済み: David J. Mack 2015 年 12 月 4 日
Hi John & Bus141!
Since you seem to be stuck in some argument, I recommend this article on Stackoverflow concerning a similar problem:
The accumarray solution is much faster than GRPSTATS - at least for remotely "large" arrays as mine (~1000000000 x 10) - which is similar to John's CONSOLIDATOR solution but using a built-in function.
Hope that helps, Greetings, David

サインインしてコメントする。

カテゴリ

質問済み:

2015 年 9 月 29 日

編集済み:

2015 年 12 月 4 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by