I'm trying to compute both the mean and SD of a set of values by group, and I can either do it with for-loop:
M = zeros(1,ngroup);
SD = zeros(1,ngroup);
for i = 1:ngroup
M(i) = mean(data(ind==i));
SD(i) = std(data(ind==i));
end
Or, alternatively use `accumarray` twice.
M = accumarray(ind,data,[],@mean);
SD = accumarray(ind,data,[],@std);
But is there a way to just use accumarray once and compute both quantities? Since accumarray is faster than for-loop, but calling it twice will be slow. Is it possible to do something like:
[M, SD] = accumarray(ind,data,[],{@mean,@std})

 採用された回答

Matt J
Matt J 2022 年 10 月 19 日
編集済み: Matt J 2022 年 10 月 19 日

1 投票

Since accumarray is faster than for-loop, but calling it twice will be slow.
I would argue that 3 calls to accumarray would be the most optimal.
data = rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.499977 seconds.
tic;
Out = accumarray(ind, data, [], @(x){{mean(x) std(x)}});
toc;
Elapsed time is 0.235066 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.047581 seconds.

4 件のコメント

BRIAN XU
BRIAN XU 2022 年 10 月 19 日
Wow that's interesting, many thanks!
Bruno Luong
Bruno Luong 2022 年 10 月 19 日
Accumarray is optimized when doing sum.
Using function handle has great penalty.
Matt J
Matt J 2022 年 10 月 19 日
編集済み: Matt J 2022 年 10 月 19 日
I should mention though that there might be some sacrifice in numerical accuracy using the expansion,
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
The subtraction of S2 and 2*S.*M can give large floating point residuals.
data = 10000+rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.574901 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.048677 seconds.
errorMean=norm(M-M0)/norm(M0)
errorMean = 4.4567e-15
errorSTD=norm(SD-SD0)/norm(SD0)
errorSTD = 5.3453e-06
Bruno Luong
Bruno Luong 2022 年 10 月 19 日
Small simplification
N=accumarray(ind,1);
S = accumarray(ind,data);
M = S./N;
S2=accumarray(ind,data.^2);
SD = sqrt((S2 - S.*M)./(N-1));

サインインしてコメントする。

その他の回答 (1 件)

Star Strider
Star Strider 2022 年 10 月 19 日

2 投票

The accumarray approach is certainly possible —
v = randn(250,1)
v = 250×1
0.0501 -0.2469 1.0253 1.1484 -0.9196 0.4947 -0.7221 1.2164 -0.3162 0.0461
Out = accumarray(ones(size(v)), v, [], @(x){{mean(x) std(x)}})
Out = 1×1 cell array
{1×2 cell}
meanv = Out{:}{1}
meanv = -0.0333
stdv = Out{:}{2}
stdv = 0.9419
Make appropriate changes to work with your data.
.

2 件のコメント

BRIAN XU
BRIAN XU 2022 年 10 月 19 日
that's a nice idea! thank you!
Star Strider
Star Strider 2022 年 10 月 19 日
My pleasure!

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangePerformance and Memory についてさらに検索

質問済み:

2022 年 10 月 19 日

コメント済み:

2022 年 10 月 19 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by