Extracting data using regular expression

7 ビュー (過去 30 日間)
Shuvashish Roy
Shuvashish Roy 2021 年 5 月 20 日
コメント済み: Shuvashish Roy 2021 年 5 月 21 日
Hi,
I have the attached text file. I want to extract all the columns starting from line 1472(if used notepad) named "Physics", "Time", "dt", "Progress", "Nonlinear Iteration" "Linear Iterations"...."Nodes After Adaption". I don't know how to specify the header names so that only the numeric values after that headers are extracted in a dataframe or matrix format. Thanks a lot for your help.
Input file format:
Unnecessary lines with text
Unnevessary lines with text
................................
many unnecessay lines............
adh_run_func :: tfinal = 12513600.000000
Physics Time dt Progress Nonlinear Iteration Linear Iteration Max Resid Norm ... Nodes After Adaption
HYD_1 11908800 5 0 1 ........ ...65926
HYD_1 11908800 5 0 2 ...... ...65926
............................................................................................. ................................
............................................................................................. ................................
100% COMPLETE
output file format:
Physics Time dt Progress Nonlinear Iteration Linear Iteration Max Resid Norm ... Nodes After Adaption
HYD_1 11908800 5 0 1 ........ ...65926
HYD_1 11908800 5 0 2 ...... ...65926
............................................................................................. ................................

採用された回答

per isakson
per isakson 2021 年 5 月 21 日
編集済み: per isakson 2021 年 5 月 21 日
"all the columns [...] named "Physics", "Time", "dt", "Progress", "Nonlinear Iteration" "Linear Iterations"...."Nodes After Adaption" " I understand that as all the columns, none excluded.
There is a choice. Shall we use readtable() or textscan()? I don't think readtable() can handle this file without relying on the critical line numbers, which I hessitate to do. It is however possible to determine the line numbers needed in a separate step and then use readtable(). textscan() is able to parse a 1D character array, which readtabe() is not. Only TMW knows why.
I choose textscan().
%% Read file
chr = fileread('AR_20base_201214_adh.txt');
%% Remove meta data
% Using 'adh_run_func :: tfinal' feels more robust than using the line number
pos = regexp( chr, '^adh_run_func :: tfinal', 'once', 'lineanchors' );
chr(1:pos-1) = []; % remove until the first line that begins with 'adh_run_func :: tfinal'
%% Remove the summary lines at the end
pos = regexp( chr, '^\d+[\% ]+COMPLETE', 'once', 'lineanchors' );
chr(pos:end) = [];
%% Get the column headers
txt = regexp( chr, '^Physics.+?$', 'match', 'once', 'lineanchors' );
column_headers = strsplit( txt, '\t' );
%%
cac = textscan( chr, ['%s',repmat('%f',1,numel(column_headers)-1)] ...
, 'Headerlines' , 2 ... two remains after meta-data is removed
, 'Delimiter' , '\t' ...
, 'Whitespace' , ' %' ... ignore the %-sign in Progress
, 'CollectOutput' , true );
Physics = cac{1};
matrix = cac{2};
whos Physics matrix column_headers
Name Size Bytes Class Attributes Physics 13487x1 1537454 cell column_headers 1x17 2026 cell matrix 13487x16 1726336 double
  1 件のコメント
Shuvashish Roy
Shuvashish Roy 2021 年 5 月 21 日
Per Isakon,
I got your answer.It worked! You are awesome. Thanks a lot both you and Stephen for your valueable times.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeCharacters and Strings についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by