How can I speed up this indexing code?

8 ビュー (過去 30 日間)
bhousden
bhousden 2021 年 10 月 28 日
コメント済み: bhousden 2021 年 10 月 29 日
I have a cell array in which each cell contains a string. E.g....
a={'AAAAAA';'BBBBBB';'CCCCCC';'AAAAAA';'DDDDDD'};
Each cell in array a is associated with a row in a numerical array that contains 10 columns. E.g....
b=[0,0,0,1,1,0,0,0,0,0;
0,1,0,0,0,0,0,0,0,0;
1,0,1,0,1,0,2,0,0,0;
3,0,0,0,0,0,0,0,0,1;
0,0,0,0,0,0,2,0,0,1];
Some strings in a are repeated such as 'AAAAAA' as shown above. What I need to do is find all repeated cells in a and sum the assocated columns from b into a single row. This should result in two new arrays (unia and bnew) which have equal numbers of rows but every string in unia is unique.
Easy enough to do with a loop such as:
unia=unique(a);
bnew=zeros(numel(unia),10);
for n=1:numel(unia)
pos=find(strcmp(a,unia{n}));
bnew(n,:)=sum(b(pos,:),1);
end
This works fine for small arrays but I have a case where a has 6 million cells and unia has 300,000 cells so I need something much faster. Any ideas?
Thanks!

採用された回答

Ive J
Ive J 2021 年 10 月 28 日
Avoid comparing strings within the loop and instead take advantage of the index vector from unique:
a = ["A", "B2", "A", "C", "AA", "B2", "B2"]; % use strings instead of cell array of characters, they're much more efficinet to work with
b = randi([0 2], numel(a), 3)
b = 7×3
1 0 1 0 1 2 0 1 2 0 2 1 2 0 2 2 2 0 0 2 1
[anew, ~, idx] = unique(a);
bnew = arrayfun(@(x) sum(b(x == idx, :), 1), 1:numel(anew), 'uni', false);
bnew = vertcat(bnew{:})
bnew = 4×3
1 1 3 2 0 2 2 5 3 0 2 1
anew
anew = 1×4 string array
"A" "AA" "B2" "C"
Also, you can use tall arrays when dealing with large arrays.
  1 件のコメント
bhousden
bhousden 2021 年 10 月 29 日
Perfect! Thanks for your help.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeMatrix Indexing についてさらに検索

製品


リリース

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by