How to search a substring in a list of strings?
60 ビュー (過去 30 日間)
古いコメントを表示
I have {'xx', 'abc1', 'abc2', 'yy', 'abc100'} and I would like to search 'abc' and get back {'abc1', 'abc2', 'abc100'}. Is it possible to do this in a simple way without a for cycle?
0 件のコメント
回答 (4 件)
Jos (10584)
2018 年 1 月 29 日
In recent releases you can use startsWith
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'}
tf = startsWith(A,'abc')
B = A(tf)
See the documentation on string functions for many other utilities that may be useful for you.
0 件のコメント
Stephen23
2018 年 1 月 29 日
Much faster than using cellfun or any string functions:
>> C = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
>> Z = C(strncmp(C,'abc',3));
>> Z{:}
ans = abc1
ans = abc2
ans = abc100
1 件のコメント
Jan
2018 年 1 月 29 日
編集済み: Jan
2018 年 1 月 29 日
If you want to search at the start of the strings only, this is efficient:
A = {'xx', 'abc1', 'abc2', 'yy', 'abc100'};
B = s(strncmp(s, 'abc', 3));
Some timings:
% Some larger test data:
A = repmat({'xx', 'abc1', 'abc2', 'yy', 'abc100'}, 1, 1000);
S = string(A);
tic;
for k = 1:1000
tf = startsWith(A, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = startsWith(S, 'abc');
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = strncmp(s, 'abc', 3);
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = cellfun(@any,strfind(A, 'abc'));
B = A(tf);
end
toc
tic;
for k = 1:1000
tf = ~cellfun('isempty', strfind(A, 'abc'));
B = A(tf);
end
toc
Elapsed time is 1.492006 seconds. % startsWith(cell string)
Elapsed time is 0.308345 seconds. % startsWith(string)
Elapsed time is 0.018157 seconds. % strncmp
Elapsed time is 8.095714 seconds. % cellfun(@any, strfind)
Elapsed time is 1.706694 seconds. % cellfun('isempty', strfind)
Note that cellfun method searches for the substring anywhere in the strings, while the two other methods search at the start only. With modern string methods this would be:
tf = contains(A, 'abc');
This has an equivalent speed as startsWith.
@MathWorks: strncmp is 17 times faster for cell strings than startsWith for strings. The conversion from cell strings to strings inside startsWith let it run 65 times slower than strncmp. There is a great potential for improvements.
Fangjun Jiang
2018 年 1 月 29 日
s={'xx', 'abc1', 'abc2', 'yy', 'abc100'};
index=cellfun(@any,strfind(s,'abc'));
s(index)
2 件のコメント
Jan
2018 年 1 月 29 日
編集済み: Jan
2018 年 1 月 29 日
@Matt J: But cellfun is at least a fast C-mex function. Every function to process a cell string must contain a loop anywhere. The problem is using cellfun with an expensive anonymous function. About 4 times faster (but still slower than startsWith, timings see my answer):
index = ~cellfun('isempty', strfind(s, 'abc'));
参考
カテゴリ
Help Center および File Exchange で Data Type Conversion についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!