How to make textscan robust against non-matching lines?

6 ビュー (過去 30 日間)
Joan Vazquez
Joan Vazquez 2021 年 4 月 8 日
コメント済み: Stephen23 2021 年 4 月 9 日
I have files with lines that I want to parse, preferably with textscan. In between those lines, there may be lines to be skipped (unpredictable format and abundance, but definetely new lines). What is the best way to deal with it?
E.g. for the data in attachment, this will stop outputiing #HELLOMATHWORKS messages after line 4.
fid = fopen('data.txt');
out = textscan(fid,'#HELLOMATHWORKS,%[^,],%n');
fclose(fid);
This is a MWE out of a large code base.

採用された回答

Stephen23
Stephen23 2021 年 4 月 8 日
編集済み: Stephen23 2021 年 4 月 9 日
str = fileread('data.txt');
tkn = regexp(str,'#HELLOMATHWORKS,([^,]+),(\S+)','tokens');
tkn = vertcat(tkn{:})
tkn = 6×2 cell array
{'COM1'} {'2146'} {'COM1'} {'2147'} {'COM1'} {'2148'} {'COM1'} {'2149'} {'COM1'} {'2150'} {'COM1'} {'2151'}
vec = str2double(tkn(:,2))
vec = 6×1
2146 2147 2148 2149 2150 2151
  2 件のコメント
Joan Vazquez
Joan Vazquez 2021 年 4 月 8 日
編集済み: Joan Vazquez 2021 年 4 月 8 日
This does not produce the same output as my code:
tmp =
1×2 cell array
{6×1 cell} {6×1 double}
(Actually my messages have many more fields, this was just a MWE with 2... I have many similar functions using texscan to parse messages and I wanted to avoid refactoring them)
It is a good idea to work directly with regular expressions, but it seems that the formatSpec input parameter of textscan is not just any regular expression, it is more limited...
Anyway, It's OK for the moment, I'll accept the answer, thanks
Stephen23
Stephen23 2021 年 4 月 9 日
@Joan Vazquez: I presume that the text #HELLOMATHWORKS is not what is actually in your file. If the actual text contains some unique character that does not exist anywhere else in the file, you might be able to leverage the LineEnding/EndOfLine option to achieve the goal of reading the file data using textscan.

サインインしてコメントする。

その他の回答 (1 件)

Joan Vazquez
Joan Vazquez 2021 年 4 月 8 日
This works, but it does not seem the best solution...Ideally, I would tell textscan "skip everything until a new line starts with #HELLOMATHWORKS"
filetext = fileread('data.txt');
expr = '[^\n]*#HELLOMATHWORKS[^\n]*';
% Find and return all lines that contain the text '#HELLOMATHWORKS'.
matches = regexp(filetext,expr,'match');
% Make it a 1xN char to feed textscan
goodlines = sprintf('%s\n', matches{:});
tmp = textscan(goodlines,'#HELLOMATHWORKS,%[^,],%n');

カテゴリ

Help Center および File ExchangeText Data Preparation についてさらに検索

タグ

製品


リリース

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by