Regular Expression help (is a line end token being matched in the middle of a string?)

Question

Andrew 2014 年 5 月 7 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/128671-regular-expression-help-is-a-line-end-token-being-matched-in-the-middle-of-a-string

回答済み: Prateekshya 2024 年 10 月 24 日

MATLAB Online で開く

Hi all,

I've hit a lack of understanding on regular expressions and I'm hoping someone can help

If I start with a simple enough string:

str = 'Serial Number: NEO1B39100092'

I first check for the existence of my keyword:

keyword = '(\W|^)serial number:*(\W|$)+'

and then try to find its value with:

   expr = ['(?<=', keyword, ')[^\s]+']  % N.B. using [^\s] as a template for '[^', delimiters ']'

for comparison, I'll use the keyword without the line end anchor

   keyword2 = '(\W|^)serial number:*(\W)+'
   expr2 = ['(?<=', keyword2, ')[^\s]+']

Now if I use regexpi to test the two expressions

value = regexpi(str, expr, 'match', 'once')
value2 = regexpi(str, expr2, 'match', 'once')

I see that

value = ':',

while

value2 = 'NEO1B39100092'

My take on this is that the line anchor '$' from the original keyword is being match with the letter r somehow. For it to return ' : ' the character before ' : ' must match (\W|$)+ , and it can't be the \W because expr2 gives the expected result.

Can anyone shed some light on this for me?

Thanks for any help, Andrew

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Prateekshya 2024 年 10 月 24 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/128671-regular-expression-help-is-a-line-end-token-being-matched-in-the-middle-of-a-string#answer_1536500

MATLAB Online で開く

Hello Andrew,

It looks like you are encountering an issue with how you are using regular expressions, specifically with the use of anchors and non-word character matching. Let us break down what is happening and how you can adjust your expressions to get the desired result.

Understanding the Regular Expression

Anchors:

\W matches any non-word character (anything other than a-z, A-Z, 0-9, and underscore). ^ matches the start of a string. $ matches the end of a string.

Your Pattern:

keyword = '(\W|^)serial number:*(\W|$)+' is intended to match "Serial Number:" preceded by a non-word character or the start of the string and followed by a non-word character or the end of the string. The issue arises from using (\W|$)+ at the end, which can match multiple non-word characters or the end of the string, potentially leading to unexpected results when combined with (?<=...). The pattern (\W|$)+ matches the colon (:) because the * in serial number:* allows for zero colons, and (?<=...) looks for a match right before the non-space sequence. The $ anchor does not work as expected here because it is not at the end of the string.

Solution

To extract "NEO1B39100092" correctly, you need to refine your regular expressions:

Define the Keyword:

Remove the $ anchor since it does not apply between words. Use \s* to handle potential spaces after the colon.

Expression for Extraction:

Use a positive lookbehind to identify the pattern correctly.

Revised Code

str = 'Serial Number: NEO1B39100092';
% Define a more precise keyword pattern
keyword = '(?i)(\W|^)serial number:\s*';  % Case-insensitive match, allows spaces after colon
% Expression to extract the serial number
expr = ['(?<=', keyword, ')[^\s]+'];
% Use regexpi to extract the value
value = regexpi(str, expr, 'match', 'once');
disp(['Extracted Value: ', value]);

I hope this helps!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Regular Expression help (is a line end token being matched in the middle of a string?)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Regular Expression help (is a line end token being matched in the middle of a string?)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示