フィルターのクリア

Can REGEXP map values from different parts of a text file?

1 回表示 (過去 30 日間)
Brad
Brad 2013 年 6 月 5 日
I have a text file with the following contents:
MSNout_BER (0:31) Observation #100 Rx'd at: (58568.000) Msg. Time: (58568.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rel Mode: Active
MSNout_SSS (0:32) Observation #101 Rx'd at: (58569.000) Msg. Time: (58569.000)
Forward to IRU: true Rcv Date: 2010121 Synch: a0a0 Bel Mode: High
Type: 12 Malck ID: 12345 Time Tag: 58548.12345678
Hand ID: 0 SV ID: 51 Spam ID: 0 BOZ/FAS: 0 Realt Flag: 0
MSNout_BER (0:33) Observation #102 Rx'd at: (58570.000) Msg. Time: (58570.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rel Mode: Active
MSNout_SSS (0:34) Observation #103 Rx'd at: (58571.000) Msg. Time: (58571.000)
Forward to IRU: true Rcv Date: 2010121 Synch: a0a0 Bel Mode: High
Type: 1 Malck ID: 12345 Time Tag: 58549.12345678
Hand ID: 1 SV ID: 2 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58550.12345678
Hand ID: 1 SV ID: 2 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58551.12345678
Hand ID: 1 SV ID: 2 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58552.12345678
Hand ID: 1 SV ID: 2 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58553.12345678
Hand ID: 1 SV ID: 1 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58554.12345678
Hand ID: 1 SV ID: 1 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58555.12345678
Hand ID: 1 SV ID: 1 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
Type: 1 Malck ID: 12345 Time Tag: 58556.12345678
Hand ID: 1 SV ID: 3 Spam ID: 0 BOZ/FAS: 1 Realt Flag: 0
I’m using the following commands to retrieve the values for the Time Tag: and SV ID: (values 1 and 2 only, all others are ignored);
[fn,pn] = uigetfile('*.txt,"Select Text File');
OAMfilename = fullfile(pn, fn);
buffer = fileread(OAMfilename);
pattern = '*?Tag:\s+([\d\.]+).*?SV ID:\s+([12])\W';
tokens = regexp(buffer, pattern, 'tokens');
data = reshape(str2double([tokens{:}]), 2, []).';
Results:
58548.1234567800 2
58550.1234567800 2
58551.1234567800 2
58552.1234567800 2
58553.1234567800 1
58554.1234567800 1
58555.1234567800 1
Initially, I thought the results were as expected. Then I noticed the time tag for the first occurrence of SV ID equal to 2 was wrong - 58549.12345678 is the proper time tag.
Is it possible to force MATLAB to recognize each Time Tag value that occurs just prior to each SV ID value? Could a Lookaround operator be used in this case?

採用された回答

per isakson
per isakson 2013 年 6 月 7 日
編集済み: per isakson 2013 年 6 月 10 日
This seems to work.
buf = fileread( 'cssm.txt' );
rex = '(?<=Time Tag: )([\d\.]+).+?(?<=SV ID:[ ]+)(\d+)';
cac = regexp( buf, rex, 'tokens' );
cac{:}
returns
ans =
'58548.12345678' '51'
ans =
'58549.12345678' '2'
ans =
'58550.12345678' '2'
ans =
'58551.12345678' '2'
ans =
'58552.12345678' '2'
ans =
'58553.12345678' '1'
ans =
'58554.12345678' '1'
ans =
'58555.12345678' '1'
ans =
'58556.12345678' '3'
where cssm.txt contains your data
.
Comments on the regular expression:
  • capture tokens
  • capture the group of digits, which follow after identifiers and space
  • the "identifiers and space" are used as expressions in look behind operators
  • thus two groups of (?<= name)( value)
  • between these two groups: .+?, which is a Lazy Quantifier. It advances the current position one position or more, but only as much of the quantified expression as necessary.
  • the regular expression must match one sub-string, thus something is needed to match the characters between the two groups to make the two one sub-string. In this case that is done by .+?.
Most of the italic words are copy&paste from the on-line help.
.
BTW: Your pattern works - after a little fixing:
rex = '*?Tag:\s+([\d\.]+).*?SV ID:\s+([125]{1,2})\W';
but what is the purpose of the leading *? and the trailing \W ?
.
A bit more robust:
rex = '(?<=Time Tag:)[ ]+([\d\.]+)[^\n]+?(?<=SV ID:)[ ]+(\d+)';
  • Replacing \s+ between name and value by [ ]+ excludes new-line, tab, etc.
  • Replacing .*? between the two name-value-pairs by |[^
  9 件のコメント
Cedric
Cedric 2013 年 6 月 13 日
編集済み: Cedric 2013 年 6 月 17 日
Actually
'([\d\.]+)\s+Hand.+?SV ID:\s+(\d+)'
does match SV ID 51.
What was wrong with your initial pattern is that the first match is the whole:
Tag: 58548.12345678
Hand ID: 0 SV ID: 51 Spam ID: 0 BOZ/FAS: 0 Realt Flag: 0
MSNout_BER (0:33) Observation #102 Rx'd at: (58570.000) Msg. Time: (58570.000)
Forward to IMU: true Rcv Date: 2010121 Synch: f0f0 Rel Mode: Active
MSNout_SSS (0:34) Observation #103 Rx'd at: (58571.000) Msg. Time: (58571.000)
Forward to IRU: true Rcv Date: 2010121 Synch: a0a0 Bel Mode: High
Type: 1 Malck ID: 12345 Time Tag: 58549.12345678
Hand ID: 1 SV ID: 2
(which gives time=58548.12345678 and SVID=2)
If you want to select only those with SV IDs 1 and 2, you can use
'([\d\.]+)\s+Hand[^B]+?SV ID:\s+([12])'
which works based on the fact that there is no 'B' in between the time tag and the SV ID (it appears only after the SV ID in 'BOZ'). You could also use an expression that prevents another 'Time Tag' to appear in between the initial time tag and the SV ID, or limit the number of characters in between the tie tag and the SV ID (i.e. replace .+? with .{1,45}), but I think that ^B is simpler. Of course, you could just stick to the expression which matches all entries and then filter out those with SV IDs not in {1,2} after conversion to numeric.
Brad
Brad 2013 年 6 月 17 日
Per, Cedric, after re-installing MATLAB I'm getting the proper results. I tried both approaches provided by the 2 of you and they run like a champ. Thanks for the help on this.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeStartup and Shutdown についてさらに検索

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by