How can I sort my data from regexp?

2 ビュー (過去 30 日間)
Linus Dock
Linus Dock 2016 年 10 月 14 日
編集済み: Guillaume 2016 年 10 月 14 日
Hi I have a problem when using regexp with this command.
RVRtmp=regexp(TXTmod,'R\d\d\w\/\w*\d\d\d\D\>','match')
The output cell is mostly empty and looks like this:
[]
[]
[]
[]
[]
[]
[]
<1x4 cell>
<1x4 cell>
<1x4 cell>
<1x4 cell>
<1x4 cell>
[]
I would like to obtain the information in the [1x4 cells]. The information inside the cells look like this:
'R01L/P1500N' 'R19R/0900VP1500N' 'R01R/0800V1400D' 'R19L/1000N'
Here I would like to obtain the information 'R01L' as a variable or string and the corresponding value of '1500' as a vector or cell. I'm having a bit of trouble to extract the data as the empty cells is not working with my command:
RVR1=regexp(RVRtmp{1072}{1},'\d{4}','match')
I would like to arrange the data like this:
R01L =
NaN
NaN
1500
2000
1000
500
700
NaN
TXTmod looks like this:
'METAR ESSA 200901220720Z 03003KT 1500 R01L/P1500N R19R/P1500N R01R/0700N R19L/0800V1000N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220750Z 04003KT 020V090 1500 R01L/P1500N R19R/P1500N R01R/0800V1000N R19L/0900N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220820Z 02003KT 320V100 1000 R01L/P1500N R19R/0900VP1500N R01R/0800V1400D R19L/1000N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220850Z 06004KT 0900 R01L/P1500N R19R/1100V1500U R01R/1000V1400N R19L/1200N FZFG VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 0700'
'METAR ESSA 200901220920Z 04003KT 360V060 1000 R01L/P1500N R19R/1200U R01R/0700N R19L/1000VP1500N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 1500'
'METAR ESSA 200901220950Z 04004KT 1500 BR VV002 M00/M00 Q1005 01710173 08710164 51710170 NOSIG'
'METAR ESSA 200901221020Z 01003KT 1700 BR BKN002 BKN017 M00/M00 Q1005 01710173 08710164 51710170 NOSIG'
'METAR ESSA 200901221050Z 35004KT 2500 BKN002 BKN019 00/00 Q1004 01710173 08710164 51710170 NOSIG'

採用された回答

Guillaume
Guillaume 2016 年 10 月 14 日
編集済み: Guillaume 2016 年 10 月 14 日
There is no real need for the intermediate regexp, you can get it all with just one regular expression:
tokens = regexp(TXTmod, '(R\d\d\w)/\w*(\d\d\d\d)\D\>', 'tokens'); %You were missing a \d in your regexp (which was captured by the \w* so it didn't matter)
Or more efficient (but a bit longer):
tokens = regexp(TXTmod, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens')
Note the inefficiency in your original expression: The \w*\d\d\d in your first regular expression is going to cause a lot of backtracking by the regular expression engine because the \w* is always going to match the next three \d. Because * is greedy, at first the engine is going to match the three digits with \w* and find then that it can't match 3 digits after. So it's going to backtrack one digit, match the first two digits with \w*, the 3rd digit with \d and find that it still can't find a match for the next two \d. it will have to backtrack two more times until \w* only match the letters and the three \d match a digit.
The new regular expression matches a optional group of 4 digits followed by 1 or more letter and then capture the final groups of 4 digits before the last letter. I've also added a start of word match: \<.
Other note: To rearrange the tokens of each string into a two column cell array:
cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false)

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeNumeric Types についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by