How to effectively use look ahead with regexp?

Question

pietro 2017 年 6 月 26 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp

編集済み: Stephen23 2017 年 6 月 27 日

Hi all,

I'm doing some coding with regular expressions, but there are a couple of things I can't understand. Look at the following

1. searching the letter 'r' followed by a number:

regexp('19f/4r power shift','(?<=\d*) ?r')
ans = 
  6    12
regexp('19f/4r power shift','(?<=\d)\s?r')
ans = 
    6

Why the '*' change so much the result? The 'r' at the 12th position is not followed by any number.

2- Searching for the word 'Reverser' that is not preceded by the words 'power' or 'powr'.

regexp('power  Reverser','(?<!powe?r) *-? *Reverser','match')
ans = 
    ' Reverser'

Reverser is preceded by the string 'power', so it shouldn't be selected.

Why do these occur?

Thanks

Best regards,

Pietro

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Stephen23 2017 年 6 月 26 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/346279-how-to-effectively-use-look-ahead-with-regexp#answer_271972

編集済み: Stephen23 2017 年 6 月 26 日

MATLAB Online で開く

1. "searching the letter 'r' followed by a number." Actually you seem to be wanting to search for the letter 'r' preceded by a number, not "followed by". Only the second of your regexps does this. By adding the * to the first regexp you make the digits optional (the asterisk matches zero or more times!) So clearly the second r in that short string matches your first regular expression: it constitutes an 'r' preceded by zero spaces (permitted by the ?) and by zero digits (permitted by the *).

You could use + (match one or more) rather than * (match zero or more):

regexp('19f/4r power shift','(?<=\d+)\s?r')

but this is not really necessary: matching one digit is enough because if there are multiple digits then there is also one digit.

2. This is a much more subtle problem. The basic problem here is the optimism of regular expressions, and that * on the space character. What happens is that the regular expression parser keeps on trying new combinations to match as much of the string as possible, which clearly differs from how you perceive its operation (you want it to quit after matching that lookaround once).

The regular expression will correctly match 'power', but then it notices that you placed an asterisk * on the space. When it tries, for example, one space character preceding that word then your lookaround is satisfied: if it matches one space with the optional spaces ' *' regex, then the look around is also satisfied because what precedes that one space? Another space character! Therefore the lookaround is happy (one space is not equal to 'power'), and the regular expression parser is happy because it wants to match as much of the string as possible. Therefore it picks this option.

Basically what you seem to want is a pessimistic parser (you want to return no match if any one combination is a match to that lookaround, even if others do not match the lookaround), but in reality regexp parsers are optimistic: they return a match if any one combination is a match. They reject the one case that you are interested in because other cases better fulfill their basic operational principal: match as much as possible, however it can.

To see what parts of the strings are matched you should look at using a dynamic regular expression, e.g. adding:

(?@disp($1))

into your regexp and seeing how the string is parsed.

Do you really need to match an unknown number of space characters?

2 件のコメント
なしを表示なしを非表示

pietro 2017 年 6 月 26 日

I got it!!! thanks a lot

Stephen23 2017 年 6 月 27 日

編集済み: Stephen23 2017 年 6 月 27 日

MATLAB Online で開く

You could move the space inside the lookaround:

>> regexp('power  Reverser','(?<!powe?r *)Reverser','match')
ans = 
     {}
>> regexp('power X Reverser','(?<!powe?r *)Reverser','match')
ans = 
    'Reverser'

サインインしてコメントする。

How to effectively use look ahead with regexp?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

How to effectively use look ahead with regexp?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示なしを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示