Hello Andrew,
It looks like you are encountering an issue with how you are using regular expressions, specifically with the use of anchors and non-word character matching. Let us break down what is happening and how you can adjust your expressions to get the desired result.
Understanding the Regular Expression
\W matches any non-word character (anything other than a-z, A-Z, 0-9, and underscore). ^ matches the start of a string. $ matches the end of a string.
keyword = '(\W|^)serial number:*(\W|$)+' is intended to match "Serial Number:" preceded by a non-word character or the start of the string and followed by a non-word character or the end of the string. The issue arises from using (\W|$)+ at the end, which can match multiple non-word characters or the end of the string, potentially leading to unexpected results when combined with (?<=...). The pattern (\W|$)+ matches the colon (:) because the * in serial number:* allows for zero colons, and (?<=...) looks for a match right before the non-space sequence. The $ anchor does not work as expected here because it is not at the end of the string.
Solution
To extract "NEO1B39100092" correctly, you need to refine your regular expressions:
Remove the $ anchor since it does not apply between words. Use \s* to handle potential spaces after the colon.
- Expression for Extraction:
Use a positive lookbehind to identify the pattern correctly.
Revised Code
str = 'Serial Number: NEO1B39100092';
keyword = '(?i)(\W|^)serial number:\s*';
expr = ['(?<=', keyword, ')[^\s]+'];
value = regexpi(str, expr, 'match', 'once');
disp(['Extracted Value: ', value]);
I hope this helps!