How do I compare a cell array containing string arrays against a string array without using loops?
古いコメントを表示
I'm comparing two variables (data attached):
- tableOfTextByTime.("tweetUniqueMentions"), which is a 500x1 cell array. The content of each cell is a string array that may contain 0 or more words. Each cell can contain a different number of words. See screenshot:

- tableOfUsers{:,1}, which is a 334x1 string array

The code below works using a for loop, an anonymous function, and cellfun, but it's slow.
It's ok for a small test dataset, but when running on a real data set (20,000 x 1 cell array) and (5,000 x 1 string array) it takes way too long.
for i = height(tableOfUsers): -1: 1
% create a wrapped strcmp anon fcn that takes each cell element and
% each string element
wStrcmp = @(anonInp1) any(strcmp(anonInp1, tableOfUsers{i,1}));
% create matrix of indices for the entries that match the criteria (500x334)
idxMat(:,i) = cellfun( wStrcmp, tableOfTextByTime.("tweetUniqueMentions"),'UniformOutput',false);
% grab the relevant text that match the criteria
correspondingText{i,1} = tableOfTextByTime(cell2mat(idxMat(:,i)),:);
end
How can I get an equivalent result while drastically speeding up the code? Is there a way to do this in a vectorized or element-wise manner? bsxfun and arrayfun seem to have limitations when working with strings. Parallel computing toolbox not an option : )
3 件のコメント
the cyclist
2022 年 12 月 21 日
Can you upload the data? You can use the paper clip icon in the INSERT section of the toolbar. We can't work with a picture of the data.
Walter Roberson
2022 年 12 月 21 日
Please describe in words what the desired outcome is.
- for each cell, you need to know whether at least one string in the cell appears anywhere in the string array?
- for each string in each cell, you need to know of the string appears anywhere in the string array?
- for each string in the string array, you need to know which cells it appears in?
Ed Marquez
2022 年 12 月 21 日
採用された回答
その他の回答 (1 件)
I think more can be done, but here a couple improvements that make the small test case faster. Hopefully it is an ever larger speed-up on your real problem.
load("answersData.mat")
tic
for i = height(tableOfUsers): -1: 1
% create a wrapped strcmp anon fcn that takes each cell element and
% each string element
wStrcmp = @(anonInp1) any(strcmp(anonInp1, tableOfUsers{i,1}));
% create matrix of indices for the entries that match the criteria (500x334)
idxMat(:,i) = cellfun( wStrcmp, tableOfTextByTime.("tweetUniqueMentions"),'UniformOutput',false);
% grab the relevant text that match the criteria
correspondingText{i,1} = tableOfTextByTime(cell2mat(idxMat(:,i)),:);
end
toc
tic
% Preallocate, and pull out desired subset of data (so indexing doesn't need to be done repeatedly)
idxMat2 = false(height(tableOfTextByTime),height(tableOfUsers));
C = tableOfTextByTime.("tweetUniqueMentions");
T = tableOfUsers{:,1};
for i = height(tableOfUsers): -1: 1
% create a wrapped strcmp anon fcn that takes each cell element and
% each string element
wStrcmp2 = @(anonInp1) any(strcmp(anonInp1, T(i)));
% create matrix of indices for the entries that match the criteria (500x334)
idxMat2(:,i) = cellfun( wStrcmp2, C);
% grab the relevant text that match the criteria
correspondingText2{i,1} = tableOfTextByTime((idxMat2(:,i)),:);
end
toc
% Test that the two methods result in the same output
isequal(correspondingText,correspondingText2)
カテゴリ
ヘルプ センター および File Exchange で Data Type Identification についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!