Intersection of large number of arrays

Question

ALiveris 2011 年 10 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/17365-intersection-of-large-number-of-arrays

Hey there,

I want to solve the following problem. I have 2000 arrays of 500 strings each (500x1) and want to create an 3000x1 array of strings which appear most often in those 2000 initial arrays. I know intersection is not the right term for that, but I dont know how to explain it better. Any suggestions about the most efficient way to do that?

ps. The most obvious way to do that is to put all unique strings in an array followed by the number of times they appear, then sort the array and keep the first 3000 rows. However I am looking for a faster and more "sophisticated" way to do it.

thanks!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Fangjun Jiang 2011 年 10 月 3 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/17365-intersection-of-large-number-of-arrays#answer_23420

I think you'll have to do some kind of sort() or unique() operation.

My thought is: combine all strings in one big cell array, run

[B,I,J]=unique(BigCellArray)

Then, use function hist(J) to get the index of the most frequent occurrence.

Assume average 10 characters per string, 2000*500*10*2=20M bytes, no big deal!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Walter Roberson 2011 年 10 月 3 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/17365-intersection-of-large-number-of-arrays#answer_23425

MATLAB Online で開く

That approach is not bad, actually.

[ustrings, a, b] = unique(vertcat(A1,A2,A3,...,A2000));
counts = accumarray(b.', 1);
[scounts, sidx] = sort(counts, 'descending');
commonstrings = ustrings(sidx(1:min(end,3000)));

There are algorithms that would take less temporary memory, but the above will not copy the string contents themselves around, just references to the strings, so really it is fairly memory efficient and time efficient... and certainly a lot easier to code than the alternatives.

As to the effort to write out the vertcat() of the 2000 string arrays: if that proves to be a problem, then consider rewriting your program so that you Don't Do That.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 3

ALiveris 2011 年 10 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/17365-intersection-of-large-number-of-arrays#answer_23431

thanks both, ill try it tommorow when I will have my arrays ready and let you know in case I need more help

cheers

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Intersection of large number of arrays

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (3 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Intersection of large number of arrays

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (3 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示