finding string in between
現在この質問をフォロー中です
- フォローしているコンテンツ フィードに更新が表示されます。
- コミュニケーション基本設定に応じて電子メールを受け取ることができます。
エラーが発生しました
ページに変更が加えられたため、アクションを完了できません。ページを再度読み込み、更新された状態を確認してください。
古いコメントを表示
Hi, so I have a cell string with 100 X 1 like:
18WABO1-12345-0X
18WABO2-12345-0N
18WACE3-12345-00
18WACE4-12345-0R
18WAGUG-12345-0G
18WDUER-12345-0N
I would like to find the string sequence that is always between 18W and first - so the result is:
ABO1
AB02
ACE3
ACE4
AGUG
DUER etc
my example of a code:
%
somestring(:)= eic_p;
underscore_indices= strfind(somestring,'18W');
underscore_indices=cell2mat(underscore_indices);
fs_indices = strfind(somestring,'-');
fs_indices=fs_indices';
your_number=cellfun(@(v)v(1),fs_indices);
somestring(:)= somestring';
for i=1:length(fs_indices)
yourNumber= somestring{i}(underscore_indices(i)+2:your_number(i)-1);
%HOW i can save every iteration? thanks
end
in the last for loop somehow I am getting the weird output and can not save all results so I can have all those 205 abbreviations in one variable (yourNumber).
Thanks a lot,
採用された回答
per isakson
2017 年 10 月 18 日
編集済み: per isakson
2017 年 10 月 18 日
yourNumber is overwritten in the loop and only the last value is saved. The first step to fix your code is
yourNumber = cell( length(fs_indices), 1 );
for i = 1 : length(fs_indices)
yourNumber{i} = somestring{i}(underscore_indices(i)+2:your_number(i)-1);
end
There are other ways, e.g. with regular expressions
>> str = '18WABO1-12345-0X';
>> regexp( str, '(?<=18W)[^\-]+(?=\-)', 'match' )
ans =
'ABO1'
and
cac = {
'18WABO1-12345-0X'
'18WABO2-12345-0N'
'18WACE3-12345-00'
'18WACE4-12345-0R'
'18WAGUG-12345-0G'
'18WDUER-12345-0N' };
%
out = regexp( cac, '(?<=18W).+?(?=\-)', 'match' );
out = cat( 1, out{:} );
and
>> out
out =
'ABO1'
'ABO2'
'ACE3'
'ACE4'
'AGUG'
'DUER'
and with indexing
>> str = char( cac );
>> str = str( :, 4:7 )
str =
ABO1
ABO2
ACE3
ACE4
AGUG
DUER
>>
10 件のコメント
No need for the look forward/around with your first pattern, and you can add the option 'once' to avoid the CAT. And if you need to debug it .. well I'm not really sure .. yet ;)
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
I tried this myself, and came up with the almost the same reg exp, just with the ^ to match the start:
regexp(C,'(?<=^18W)[^-]+','once','match')
per isakson
2017 年 10 月 18 日
編集済み: per isakson
2017 年 10 月 18 日
Yes, the look-ahead is overkill and 'once' will save a microsecond. With long strings 'once' makes a significant difference.
However, regexp with or without 'once' returns a cell array of scalar cell arrays, which in turn contain the strings. cat "flattens" the cell array.
There is always one more level without the 'once':
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match' )
out2 =
6×1 cell array
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
{1×1 cell}
>> out2 = regexp( cac, '(?<=18W)[^-]+', 'match', 'once' )
out2 =
6×1 cell array
{'ABO1'}
{'ABO2'}
{'ACE3'}
{'ACE4'}
{'AGUG'}
{'DUER'}
.. you probably forgot to copy one of the lines (call to CAT) when you copy-pasted your example from the command window.
per isakson
2017 年 10 月 18 日
編集済み: per isakson
2017 年 10 月 18 日
"There is always one more level without the 'once':" Yes, that's correct. Now, I'll remember. A pity there isn't a strike-out feature.
One thing still puzzles me
out2 =
6×1 cell array
{'ABO1'}
why the braces around 'AB01'. Here on R2016a I get
>> out2
out2 =
'ABO1'
Have The MathWorks changed the display format?
Wow, you're right, I had never realized, or already forgotten(!) My output is from 2017b, but I was on 2016b until very recently .. I'm wondering if I didn't pay attention or if the update was between 2016a/b (?)
sensation
2017 年 10 月 19 日
Thanks a lot guys for your answers! One quick question: can you just briefly eleborate (?<=18W)[^-]+ ?, or where I can find those expressions when I should use ? ^ or/and +. Thanks!
Understanding this, you will understand that
- (?<=..) is a look-behind and (?<=18W) imposes that what is matched (by the rest of the pattern) is preceded by 18W
- [^..] defines a set of elements not to match, so [^-] matches all characters but the dash.
- + is a quantifier that means one or more times the expression that precedes directly (which is [^-])
So the whole thing reads: match one or more character that is not a dash (which translates into "read all until a dash"), preceded by the literal 18W.
Stephen23
2017 年 10 月 19 日
"where I can find those expressions when I should use ? ^ or/and +."
By reading the documentation ten times:
And then read it another ten times. And practice lots.
Regular expressions are powerful and very useful, but they require practice and attention to detail. Study that page I linked to, and the other pages that it links to as well.
sensation
2017 年 10 月 19 日
Thanks!
その他の回答 (0 件)
カテゴリ
ヘルプ センター および File Exchange で Characters and Strings についてさらに検索
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
