unexpected comma in regexp output

Hi, everyone, I want to extract the word "Df(3R)ED50003". from a string below:
The word is composed of A-Z a-z 0-9 - _ ( )
aStr = 'w[1118]; Df(3R)ED50003, P{w[+mW.Scer\FRT.hs3]=3''.RS5+3.3''}ED50003/TM6C, cu[1] Sb[1]';
Below is the code I used:
[t1,t2] = regexp(aStr,'.*(Df[\(\)-_a-zA-Z0-9]+).*','tokens')
However, I got :
t1{1}{1}
Df(3R)ED50003,
There is a comma in the end which I did not include in the regexp. I expect Df(3R)ED50003, but the results has one more comma.
Can someone help me on where an I wrong? Thanks

 採用された回答

Guillaume
Guillaume 2019 年 10 月 26 日
編集済み: Guillaume 2019 年 10 月 26 日

0 投票

Note your example input is not valid matlab syntax. I assume the internal ' are meant to be doubled.
I'm not too sure what you're trying to do with your regex, some of it is overcomplicated, e.g.:
regexp(s, '.*(somexepr).*', 'tokens')
is the same as the simpler (and most likely much faster, .* can slow regexp tremendously if used carelessly)
regexp(s, 'somexpr', 'match')
I'm not entirely clear on what exactly you want to include in your match. I don't think you understand fully how [] works in a regexp, and in particular the role of - in there. Your [\(\)-_a-zA-Z0-1]+ expression matches one or more of:
  • a (, your \(,
  • any character in the range '(':'_', your \)-_, note that this range does include the comma. It's probably where you went wrong.
  • any character in the range 'a':'z',
  • any character in the range 'A':'Z',
  • 0, or 1, which you have written as 0-1 but could be written more simply as 01
>> '(':'_' %all characters matched by \)-_
ans =
'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_'

4 件のコメント

raym
raym 2019 年 10 月 26 日
Thanks.I'm wrong at not add a slash \ before - to indicate the char - itself.
Below works:
'.*(Df[\(\)\-_a-zA-Z0-9]+).*'
Guillaume
Guillaume 2019 年 10 月 26 日
Or you could move the - to the beginning or the end of the list, where it doesn't need escaping:
'[-\(\)_a-zA-Z0-9]+' % - doesn't need escaping when it's the first in the list
%or
'[\(\)_a-zA-Z0-9-]+' % or when it's the last
raym
raym 2019 年 10 月 27 日
Yes.
I also found that below two command is same:
regexp(s, 'somexpr', 'match')
regexp(s, '(somexpr)', 'match')
somexpr can be surround by () even when there is no () in string.
Stephen23
Stephen23 2019 年 10 月 27 日
"somexpr can be surround by () even when there is no () in string."
That is because parentheses are a grouping operator, not literal characters:

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

ヘルプ センター および File ExchangeCharacters and Strings についてさらに検索

タグ

質問済み:

2019 年 10 月 26 日

コメント済み:

2019 年 10 月 27 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by