Extract numbers from mixed string

Question

4 投票

I have a file containing header lines like the following,

Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50
Operator Note:  Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)

For a given parameter such as MaxDistance or Wind Speed, I would like to extract its numerical value. This is tricky because sometimes there is an equal sign, space, or units, and sometimes there is not, because different operators enter their notes differently (lesson: next time enforce consistency).

How would I extract the following: All numerical characters (ignoring spaces and equal signs but keeping decimal points) that appear after the string representing the parameter name. Stop when a letter or punctuation mark is reached. In the case of 'MaxDistance', I would obtain 60. In the case of Wind Speed, I would obtain 16.375.

2 件のコメント
なしを表示なしを非表示

Albert Yam 2012 年 7 月 19 日

編集済み: John Kelly 2015 年 2 月 26 日

What have you tried?

Jianming She 2020 年 6 月 17 日

編集済み: Jianming She 2020 年 6 月 18 日

This seems a more general way:

function numArray = extractNumFromStr(str)
str1 = regexprep(str,'[,;=]', ' ');
str2 = regexprep(regexprep(str1,'[^- 0-9.eE(,)/]',''), ' \D* ',' ');
str3 = regexprep(str2, {'\.\s','\E\s','\e\s','\s\E','\s\e'},' ');
numArray = str2num(str3);

Example:

a = 'alpha=-3.5,beta=1e-2. but gamma = -34.1'
numArray = extractNumFromStr(a)

numArray =
   -3.5000    0.0100  -34.1000

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

Jan 2012 年 7 月 19 日

編集済み: Jan 2012 年 7 月 19 日

MATLAB Online で開く

19 投票

Import the file into a string at first, e.g. by fileread. Then you get something like this (if not, please explain all necessary details):

Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
       'Permitted Error = 50 Operator Note:  Air Temperature=20 C, ', ...
       'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];

Now omit all equal characters:

Str(strfind(Str, '=')) = [];

Finally you can get the values:

Key   = 'MaxDistance';
Index = strfind(Str, Key);
Value = sscanf(Str(Index(1) + length(Key):end), '%g', 1);

"Index(1)" cares for multiple occurences of the key.

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

Jan 2012 年 7 月 19 日

The removing of the = is clear, I think. Then STRFIND looks for the wanted string. Afterwards the first number behind this string is extracted by SSCANF. Here "behind" means the position, where the string is found plus the number of characters the string have.

Lorenzo 2013 年 10 月 30 日

This works great! Just a quick question Jan: what if you want to find all the uccurrence of a numeric value between two strings? For instance, let's say you want the numeric values that can be found between MaxDistance and Altitude in the original example (i.e. 60, 1000, 50 ecc ecc...). How can you achieve that?

I tried this:

Key1 = 'MaxDistance'; Key2 = 'Altitude'; Index1 = strfind(file, Key1); Index2 = strfind(file, Key2); Value = sscanf(file(Index1:Index2), '%g',1);

but still I can get nothing but the first value.... Also, I dont know a-priori the number of numbers that can be encontured between the two strings...

Thanks!

Lorenzo

サインインしてコメントする。

Answer 2

Stephan Koehler 2017 年 6 月 7 日

6 投票

Here is a one-line answer str2num( regexprep( Str, {'\D*([\d\.]+\d)[^\d]*', '[^\d\.]*'}, {'$1 ', ' '} ) )

2 件のコメント
なしを表示なしを非表示

Alexandre THIBEAULT 2021 年 1 月 27 日

Best answer

Marco A. Acevedo Z. 2021 年 4 月 2 日

hi, good answer but how to include the - sign (if present). Thanks.

サインインしてコメントする。

Answer 3

Freddy 2012 年 7 月 19 日

MATLAB Online で開く

2 投票

Maybe a little bit too late, but i like to present you also my ("regexp training"-) solution. :)

A = regexp(Str,'(?<Keyword>(?:\w+\s*\w+))\s*=?\s*(?<Value>\d+\.?\d*)','names');
s = struct();
for i = A, 
  s.(genvarname(i.('Keyword'))) = str2double(i.('Value'));
end

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Albert Yam 2012 年 7 月 19 日

編集済み: Albert Yam 2012 年 7 月 19 日

That took a long time for me to understand what you are doing. That's cool though.

How does it skip over 'Operator Note:' ?

Edit: Never mind I get it. It doesn't have anything for ':'. The '(?:\w' has nothing to do with a ':' in the string, it is grouping the token for 'up to two words'.

サインインしてコメントする。

Answer 4

Albert Yam 2012 年 7 月 19 日

MATLAB Online で開く

1 投票

This is how I went about it, all steps included even the errors.

teststr = 'Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50 Operator Note:  Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)';
regexp(teststr,[\d])
regexp(teststr,['\d'])
regexp(teststr,['\d'],'match')
regexp(teststr,['\d+'],'match')
regexp(teststr,['\d+.?'],'match')
regexp(teststr,['\d+\.?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d+?'],'match')
regexp(teststr,['\d+\.?\d*?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d*'],'match')

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

G 2013 年 11 月 7 日

編集済み: G 2013 年 11 月 13 日

MATLAB Online で開く

Better:

regexp(teststr,'\d+\.?\d*|-\d+\.?\d*|\.?\d+|-\.?\d+','match')

or

regexp(teststr,'-?\d+\.?\d*|-?\d*\.?\d+','match')

remains the -.34e-004 case !

Angkur Shaikeea 2021 年 10 月 21 日

編集済み: Angkur Shaikeea 2021 年 10 月 21 日

i need to extract

0.00000 0.00000 0.00000

0.00000 1.00000 0.00000

1.00000 0.00000 0.00000

from a text file containing

.............................................

Nodal positions:

0.00000 0.00000 0.00000

0.00000 1.00000 0.00000

1.00000 0.00000 0.00000

Nodal positions:

0.00000 0.00000 0.00000

0.00000 1.00000 0.00000

1.00000 0.00000 0.00000

Nodal positions:

0.00000 0.00000 0.00000

0.00000 1.00000 0.00000

1.00000 0.00000 0.00000

any help using regexp?

サインインしてコメントする。

Answer 5

Dahai Xue 2016 年 3 月 10 日

編集済み: KSSV 2021 年 1 月 25 日

MATLAB Online で開く

1 投票

C.J. Harris, I put your regexp into a function to extract all numbers using regexp. I have hard time to find an array operation that can use the 'a' and 'b' without the loop. Hopefully somebody has ideas. Of course it is not difficult to add more parameters or options to find "certain" numbers with preceding or following landmark strings.

function nums = regExtractNums(str) 
[a,b] = regexp(str, '\d+(\.\d+)?'); 
nums = zeros(length(a),1); 
for k = 1:length(a) 
    nums(k) = str2double(str(a(k):b(k))); 
end
end

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Answer 6

C.J. Harris 2012 年 7 月 19 日

MATLAB Online で開く

0 投票

In order to extract a certain value:

Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
       'Permitted Error = 50 Operator Note:  Air Temperature=20 C, ', ...
       'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];
matchWord = 'Air Temperature';
[a,b]  = regexp(Str,'\d+(\.\d+)?');
strPos = find(a > strfind(Str,matchWord),1,'first');
nValue = str2double(Str(a(strPos):b(strPos)));

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

Extract numbers from mixed string

2 件のコメント
なしを表示なしを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

その他の回答 (5 件)

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

Extract numbers from mixed string

2 件のコメント なしを表示 なしを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示 1 件の古いコメントを非表示

その他の回答 (5 件)

2 件のコメント なしを表示 なしを非表示

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

6 件のコメント 4 件の古いコメントを表示 4 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

3 件のコメント
1 件の古いコメントを表示 1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示 4 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示