Regexp expression to handle changing format

2 ビュー (過去 30 日間)
jimmy zubiate
jimmy zubiate 2022 年 3 月 6 日
コメント済み: jimmy zubiate 2022 年 3 月 9 日
%dummy data
% t,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501
% t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501
S=fileread(filename);
myexpression = ['(?<tvar>w*,'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\w*\.*\w*),'...
'(?<HNL>\w*),'...
'(?<codeTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*,'... % <== This line handles the first line of dummy data
'(?<caprTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*\s*\d*,'... % <== This line handles the first line of dummy data
'(?<logAt>\w*\.*\w*']
parts = regexp(filtered,myexpression,'names')
The third and second to last variables (codeTm, caprTm) change formats within the data. How can I modify or add logic to accept 2 to 3 spaced values within the variable "codeTm" and 3 to 4 spaced values within variable "caprTm"???
2 spaced valued variable (000 00:00:00.00)
3 spaced valued variable (000 00 00:00:00.00) or (343 19:54:20.684 8)
4 spaced valued variable (21 343 19:54:20.684 8)
Thank you for the help. My apologies for making my expresion so complicated. Still learning the in's and out's for expression formats for regexp to read data.
  2 件のコメント
Stephen23
Stephen23 2022 年 3 月 7 日
It is not clear why you are using regular expressions for importing this data: READTABLE et al have options for handling missing field data. Having you considered using the inbuilt data importing functions?
jimmy zubiate
jimmy zubiate 2022 年 3 月 9 日
In the process of learning Matlab. Persued regexp function to create a structure array where I could maneuver through the values to perform analysis needed.
What I'm thinking I should pursue is prep file to remove unwanted white space, headers and other non-useful data and import as a comma space delimited file. Then I can count items inside each variable, marked by spaces and then off to the next step.
Other option is pursue fgetl function and implement logic to read useful data gracefully. I'm attaching dummy test data for your viewing. Thanks.

サインインしてコメントする。

回答 (1 件)

Stephen23
Stephen23 2022 年 3 月 7 日
編集済み: Stephen23 2022 年 3 月 7 日
You can easily make a group optional or occur a specific number of times using any suitable quantifier, for example:
(..)? % zero or one time
(..)* % zero or more times
(..){2,4} % two to four times
etc.
However, rather than trying to match specific groups of characters I would use a simpler approach of matching sets of characters. I had to fix several other bugs in your regular expression to get this working, mostly missing backslashes and parentheses.
str = fileread('test.txt')
str =
't,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501 t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501'
rgx = ['^\s*(?<tvar>\w*),'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\d*\.?\d*),'...
'(?<HNL>\w*),'...
'(?<codeTm>[ :\w\.]*),'...
'(?<caprTm>[ :\w\.]*),'...
'(?<logAt>\d*\.?\d*)'];
parts = regexp(str,rgx,'names','lineanchors')
parts = 1×2 struct array with fields:
tvar tmCodeRdr tmCodLvl HNL codeTm caprTm logAt
parts.codeTm
ans = '000 00:00:00.00'
ans = '000 00 00:00:00.00'
But personally I would not try and reinvent the wheel for such a data file, READTABLE is much simpler:
tbl = readtable('test.txt','delimiter',',');
tbl.Properties.VariableNames = {'tvar','tmCodeRdr','tmCodLv','HNL','codeTm','caprTm','logAt'}
tbl = 2×7 table
tvar tmCodeRdr tmCodLv HNL codeTm caprTm logAt _____ _________________________ _______ _____ ______________________ _________________________ ______ {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00:00:00.00' } {'343 19:54:20.684 8' } 22.501 {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00 00:00:00.00'} {'21 343 19:54:20.684 8'} 22.501
  1 件のコメント
jimmy zubiate
jimmy zubiate 2022 年 3 月 9 日
That should work. Let me try to implement on my side and see what I get. Thanks Stephen!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by