Can REGEXP map values from different parts of a text file?

Question

Brad 2013 年 6 月 5 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/78121-can-regexp-map-values-from-different-parts-of-a-text-file

I have a text file with the following contents:

MSNout_BER (0:31) Observation #100 Rx'd at:  (58568.000) Msg. Time: (58568.000)
    Forward to IMU: true   Rcv Date: 2010121   Synch: f0f0   Rel Mode: Active
MSNout_SSS (0:32) Observation #101 Rx'd at:  (58569.000) Msg. Time: (58569.000)
    Forward to IRU: true   Rcv Date: 2010121   Synch: a0a0   Bel Mode: High
Type: 12    Malck ID: 12345 Time Tag: 58548.12345678
Hand ID: 0  SV ID:   51 Spam ID: 0  BOZ/FAS: 0  Realt Flag: 0
MSNout_BER (0:33) Observation #102 Rx'd at:  (58570.000) Msg. Time: (58570.000)
    Forward to IMU: true   Rcv Date: 2010121   Synch: f0f0   Rel Mode: Active
MSNout_SSS (0:34) Observation #103 Rx'd at:  (58571.000) Msg. Time: (58571.000)
    Forward to IRU: true   Rcv Date: 2010121   Synch: a0a0   Bel Mode: High
Type: 1 Malck ID: 12345 Time Tag: 58549.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58550.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58551.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58552.12345678
Hand ID: 1  SV ID:   2  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58553.12345678
Hand ID: 1  SV ID:   1  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58554.12345678
Hand ID: 1  SV ID:   1  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58555.12345678
Hand ID: 1  SV ID:   1  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58556.12345678
Hand ID: 1  SV ID:   3  Spam ID: 0  BOZ/FAS: 1  Realt Flag: 0

I’m using the following commands to retrieve the values for the Time Tag: and SV ID: (values 1 and 2 only, all others are ignored);

[fn,pn] = uigetfile('*.txt,"Select Text File');
OAMfilename = fullfile(pn, fn);
buffer  = fileread(OAMfilename);
pattern = '*?Tag:\s+([\d\.]+).*?SV ID:\s+([12])\W';
tokens = regexp(buffer, pattern, 'tokens');
data = reshape(str2double([tokens{:}]), 2, []).';

Results:

1234567800  2
1234567800  2
1234567800  2
1234567800  2
1234567800  1
1234567800  1
1234567800  1

Initially, I thought the results were as expected. Then I noticed the time tag for the first occurrence of SV ID equal to 2 was wrong - 58549.12345678 is the proper time tag.

Is it possible to force MATLAB to recognize each Time Tag value that occurs just prior to each SV ID value? Could a Lookaround operator be used in this case?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

per isakson 2013 年 6 月 7 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/78121-can-regexp-map-values-from-different-parts-of-a-text-file#answer_87986

編集済み: per isakson 2013 年 6 月 10 日

MATLAB Online で開く

This seems to work.

    buf = fileread( 'cssm.txt' );
    rex = '(?<=Time Tag: )([\d\.]+).+?(?<=SV ID:[ ]+)(\d+)';
    cac = regexp( buf, rex, 'tokens' );
    cac{:}

returns

    ans = 
        '58548.12345678'    '51'
    ans = 
        '58549.12345678'    '2'
    ans = 
        '58550.12345678'    '2'
    ans = 
        '58551.12345678'    '2'
    ans = 
        '58552.12345678'    '2'
    ans = 
        '58553.12345678'    '1'
    ans = 
        '58554.12345678'    '1'
    ans = 
        '58555.12345678'    '1'
    ans = 
        '58556.12345678'    '3'

where cssm.txt contains your data

.

Comments on the regular expression:

capture tokens
capture the group of digits, which follow after identifiers and space
the "identifiers and space" are used as expressions in look behind operators
thus two groups of (?<= name)( value)
between these two groups: .+?, which is a Lazy Quantifier. It advances the current position one position or more, but only as much of the quantified expression as necessary.
the regular expression must match one sub-string, thus something is needed to match the characters between the two groups to make the two one sub-string. In this case that is done by .+?.

Most of the italic words are copy&paste from the on-line help.

.

BTW: Your pattern works - after a little fixing:

rex = '*?Tag:\s+([\d\.]+).*?SV ID:\s+([125]{1,2})\W';

but what is the purpose of the leading *? and the trailing \W ?

.

A bit more robust:

rex = '(?<=Time Tag:)[ ]+([\d\.]+)[^\n]+?(?<=SV ID:)[ ]+(\d+)';

Replacing \s+ between name and value by [ ]+ excludes new-line, tab, etc.
Replacing .*? between the two name-value-pairs by |[^

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

Cedric 2013 年 6 月 13 日

編集済み: Cedric 2013 年 6 月 17 日

MATLAB Online で開く

Actually

'([\d\.]+)\s+Hand.+?SV ID:\s+(\d+)'

does match SV ID 51.

What was wrong with your initial pattern is that the first match is the whole:

 Tag: 58548.12345678
 Hand ID: 0  SV ID:   51 Spam ID: 0  BOZ/FAS: 0  Realt Flag: 0
 MSNout_BER (0:33) Observation #102 Rx'd at:  (58570.000) Msg. Time: (58570.000)
 Forward to IMU: true   Rcv Date: 2010121   Synch: f0f0   Rel Mode: Active
 MSNout_SSS (0:34) Observation #103 Rx'd at:  (58571.000) Msg. Time: (58571.000)
 Forward to IRU: true   Rcv Date: 2010121   Synch: a0a0   Bel Mode: High
 Type: 1 Malck ID: 12345 Time Tag: 58549.12345678
 Hand ID: 1  SV ID:   2

(which gives time=58548.12345678 and SVID=2)

If you want to select only those with SV IDs 1 and 2, you can use

'([\d\.]+)\s+Hand[^B]+?SV ID:\s+([12])'

which works based on the fact that there is no 'B' in between the time tag and the SV ID (it appears only after the SV ID in 'BOZ'). You could also use an expression that prevents another 'Time Tag' to appear in between the initial time tag and the SV ID, or limit the number of characters in between the tie tag and the SV ID (i.e. replace .+? with .{1,45}), but I think that ^B is simpler. Of course, you could just stick to the expression which matches all entries and then filter out those with SV IDs not in {1,2} after conversion to numeric.

Brad 2013 年 6 月 17 日

Per, Cedric, after re-installing MATLAB I'm getting the proper results. I tried both approaches provided by the 2 of you and they run like a champ. Thanks for the help on this.

サインインしてコメントする。

Can REGEXP map values from different parts of a text file?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

Can REGEXP map values from different parts of a text file?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

9 件のコメント 7 件の古いコメントを表示7 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

9 件のコメント
7 件の古いコメントを表示7 件の古いコメントを非表示