Replacing numerics in text using regular expressions.

13 ビュー (過去 30 日間)
D. Plotnick
D. Plotnick 2017 年 9 月 28 日
コメント済み: Cedric 2017 年 9 月 28 日
Hello, I am trying to figure out whether it is possible to dynamically replace numeric values in a long text block using regular expressions. Here is an example from a made-up xml file.
str = '<document><placemark><when>5</when><lat>41</lat></placemark><placemark><when>11</when></placemark></document>';
Now, I want to perform some numerical function on all of the outputs of <when>, lets say subtract 3 so that the string will read
'<document><placemark><when>2</when><lat>41</lat></placemark><placemark><when>8</when></placemark></document>';
I can already find the locations using
exp='<when>(\d+)</when>'
but I don't know how to
  1. extract the actual numerical value at that location,
  2. perform some arbitrary function on that value (subtraction, addition, division, anything)
  3. write that new value back into the string so it reads <when>newValue</when>
If I was certain that the number of characters would stay the same, I could do a for-loop with some pretty gross indexing. However, as in the above example the length of the charstring representing the numeric value might change as a result of my function (11 became 8).
I suspect there is either a really elegant regexp solution, or it is not possible at all. Hoping for the former. Cheers, Dan
  3 件のコメント
Walter Roberson
Walter Roberson 2017 年 9 月 28 日
Sometimes for presentation purposes here you need to change < to &lt; which shows up like <
Cedric
Cedric 2017 年 9 月 28 日
Please see my last edit with a more "classical" approach.

サインインしてコメントする。

採用された回答

Cedric
Cedric 2017 年 9 月 28 日
編集済み: Cedric 2017 年 9 月 28 日
Here is a small trick assuming that no character in your string interferes with formatSpec stuff:
pattern = '(?<=<when>)\d+' ;
values = str2double( regexp( str, pattern, 'match' )) ;
values = values - 3 ; % Some operation.
fSpec = regexprep( str, pattern, '%d' ) ; % Make input str a formatSpec ;)
newStr = sprintf( fSpec, values ) ;
Got to run, I will answer more tonight if you don't get a better answer.
Ok, got 5 more minutes, otherwise a good alternative was mentioned by Walter and is based on the fact that you can run MATLAB code within the replacement pattern:
repFun = @(s) sprintf( '%d', sscanf( s,'%d' ) - 3 ) ; % Update function.
newStr = regexprep( str, '(?<=<when>)\d+', '${repFun($0)}' ) ;
Finally a more "classical" approach, that matches and splits the input string, replaces the matches and rebuilds the output.
[numbers, parts] = regexp( str, '(?<=<when>)\d+', 'match', 'split' ) ;
numbers = arrayfun( @(x) sprintf( '%d', x ), str2double( numbers ) - 3, ...
'UniformOutput', false ) ;
buffer = [parts; [numbers, {''}]] ;
newStr = sprintf( '%s', buffer{:} ) ;
  2 件のコメント
Cedric
Cedric 2017 年 9 月 28 日
I updated the answer after you accepted it (@ 20:58 UTC), adding a more classical approach.
D. Plotnick
D. Plotnick 2017 年 9 月 28 日
Thanks! Both to you and Walter. This worked very well, and I was hoping that I would be able to run MATLAB functions in this context, so that opens up a whole bunch of additional ways I can use this method.
Thanks again, Dan

サインインしてコメントする。

その他の回答 (1 件)

Walter Roberson
Walter Roberson 2017 年 9 月 28 日
Yes. If you can devise a regexp pattern to isolate the number, then you can use regexprep with the ${cmd} replacement. Arguments to the commands will be passed as strings. Values can be returned as strings or as integers that will be converted to strings.
For example,
str = '<document><placemark><when>5</when><lat>41</lat></placemark><placemark><when>11</when></placemark></document>';
regexprep(str, '\d+', '${$0 - 2}')
I did not test this code (my system is busy at the moment)
  2 件のコメント
D. Plotnick
D. Plotnick 2017 年 9 月 28 日
This unfortunately had some odd behavior (Evaluation of '$0 - 2' did not produce a char vector or scalar string.) and changing this to
regexprep(str, '\d+', '${num2str($0 - 2)}')
did in fact return a modified string, but nothing like what I expected. Cedric seemed inspired by your idea however, and his solution worked quite well.
Danke, Dan
Cedric
Cedric 2017 年 9 月 28 日
You could have done it by coding the conversion to double as well. $0 refers to the match, which is a string. It must be converted to double before you can do math. Instead of loading the replacement string with commands, I created a function repFun that we call, and this function does the double conversion string-num-string.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by