Is there a better way to compute metrics on labeled array elements.

Question

Burke Rosen 2018 年 6 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/406050-is-there-a-better-way-to-compute-metrics-on-labeled-array-elements

編集済み: Burke Rosen 2018 年 6 月 18 日

For example, I have a 1d double array 'data' and a 1d cell array of strings called 'labels'. For each unique label I want the mean of the data. The best I have come up with is below. I don't believe this is fully vectorized. Is there a better way?

%%make sample dataset
n = 1000;
data = rand(n,1);
labels = char(randsample(97:122,n,true)');%[a-z]
%%get means for each label
[uniLab,~,labIdx] = unique(labels,'stable');% stable for speed
mu = arrayfun(@(x) mean(data(labIdx==x)),1:numel(uniLab));

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Walter Roberson 2018 年 6 月 17 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/406050-is-there-a-better-way-to-compute-metrics-on-labeled-array-elements#answer_325022

https://www.mathworks.com/help/stats/grpstats.html

2 件のコメント
なしを表示なしを非表示

Walter Roberson 2018 年 6 月 17 日

MATLAB Online で開く

The last step of your code can be replaced by

accumarray(labIdx, data, [], @mean)

Burke Rosen 2018 年 6 月 18 日

編集済み: Burke Rosen 2018 年 6 月 18 日

This yields a ~25% speed increase at n = 1e3 and ~5% at n = 1e5. (500 trials per algorithm, randomized order). Thank you.

サインインしてコメントする。

Answer 2

Burke Rosen 2018 年 6 月 17 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/406050-is-there-a-better-way-to-compute-metrics-on-labeled-array-elements#answer_325032

Thank you for that tip @Walter.

After further review:

1. The way I wrote the sample data set, labels is actually a character array not a cell array, one has to cellstr it to yield that.

2. mu = grpstats(data,labels,'mean') is compact, easy to read, and maybe 1 or 2 percent faster that my formulation, if one adds the cellstr.

3. My solutions is 5x faster than grpstats if labels is a character rather than a cell array.

4. My guess is that unique operates much faster on character arrays than cell arrays and the runtime of the loop (or arrayfun) over the unique labels is negligible compared the unique itself.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Is there a better way to compute metrics on labeled array elements.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Is there a better way to compute metrics on labeled array elements.

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示