Extracting data from messy text file

Data file attached. There is a header followed by row names. I want to extract the numeric data for Time, and Area and Volume then group them together into a convenient format for analysis. I've tried textscan, sscanf. I haven't regexp because I've never used it before! Many thanks in advance!

3 件のコメント

dpb
dpb 2014 年 7 月 14 日
Data file attached....
'Cepting it ain't... :)
Azzi Abdelmalek
Azzi Abdelmalek 2014 年 7 月 14 日
No file attached
Teresa Tutt
Teresa Tutt 2015 年 4 月 14 日
Yes, please can someone post the "Data.txt" file?

サインインしてコメントする。

 採用された回答

dpb
dpb 2014 年 7 月 14 日
編集済み: dpb 2014 年 7 月 14 日

2 投票

It's just a repetitive application of textscan...
fmt1='Time [T] %f';
fmt2='Area [V] %f %f %f Volume [V] %f %f %f';
fid=fopen('Data.txt');
% read first set as has unique number header lines
time=cell2mat(textscan(fid, fmt1,'headerlines',10)); % 1st time value
data=cell2mat(textscan(fid, fmt2, ...
'headerlines',3,'collectoutput',true,'delimiter','\n'))
% and second also has unique number to skip...
time=[time; cell2mat(textscan(fid, fmt1,'headerlines',5))];
data=[data; cell2mat(textscan(fid, fmt2, 'headerlines',3, ...
'collectoutput',true,'delimiter','\n'))];
while ~feof(fid)
time=[time; cell2mat(textscan(fid, fmt1,'headerlines',7))];
data=[data; cell2mat(textscan(fid, fmt2, 'headerlines',3, ...
'collectoutput',true,'delimiter','\n'))];
end
fid=fclose(fid);
At the end you'll have a Nx1 vector of time and Nx6 of volumes and areas. You could either concatenate time and data into one array or separate out A and V based on the columns in data; your choice.
At the command line the above gives me
>> [time data]
ans =
1.0e+04 *
0 1.7221 1.6475 0.0746 0.0995 0.0987 0.0009
0.1054 1.7221 1.6475 0.0746 0.1089 0.1081 0.0008
0.2108 1.7221 1.6475 0.0746 0.1102 0.1093 0.0008
0.3162 1.7221 1.6475 0.0746 0.1111 0.1103 0.0008
0.4216 1.7221 1.6475 0.0746 0.1118 0.1110 0.0008
0.5270 1.7221 1.6475 0.0746 0.1124 0.1116 0.0008
0.6324 1.7221 1.6475 0.0746 0.1129 0.1120 0.0008
0.7379 1.7221 1.6475 0.0746 0.1134 0.1126 0.0008
0.8433 1.7221 1.6475 0.0746 0.1139 0.1130 0.0008
...

5 件のコメント

Alison
Alison 2014 年 7 月 14 日
Thanks very much!! When I run it, only the first time value comes up, the remaining time values are not caught, but returns an empty vector. I'm actually looking at Inflow rather than area now.
I like the way you set up fmt 1 and 2. I didn't realise you could write fmt2 as a line when it's actually covering data on two different rows rather than columns
dpb
dpb 2014 年 7 月 14 日
I don't understand that, just repeated the test here w/ same result. Is there a different number of lines between cases or somesuch, maybe?
The "trick" that the format string works for the two lines isn't in the format string itself, it's in using newline for the delimiter. That forces the two records to be looked at as one, in essence. There's an example of precisely the case in the docs (which is where I learned it).
Is there a different file that is causing the problem with the time value? If so, attach it so can see what's the deal...
Alison
Alison 2014 年 7 月 14 日
Thanks again dpb. When I run your script, it does actually work in the sense that I get what you get but time values are missing from your results. It seems to be skipping several 'sections' of the data.
So time values are: 0, 1054.08, 2108.1599...and so on but the third time value you have is 0.5207, which is a few sections down the line.
dpb
dpb 2014 年 7 月 14 日
編集済み: dpb 2014 年 7 月 14 日
Oh, I see now...I had just glanced at the first couple and that the other data seemed in the right place and presumed it was ok. I'll have to see if can see what's causing that.
Oh, I see...there are two more lines after T=0 in the output--is that real or are there two lines missing in the first section? Assuming it's real and an artifact of the startup, need to handle the first two separately instead of just the first one. I'll amend the answer in a few minutes...
dpb
dpb 2014 年 7 月 14 日
編集済み: dpb 2014 年 7 月 14 日
...I'm actually looking at Inflow rather than area now.
I didn't try it, I think you'll need to modify fmt to
fmt2=['Area [V] %f %f %f' repmat(%*s,1,5) 'InFlow [V/T] %f %f %f'];
to skip the five string fields in the intermediary line. The alternative is to read the Area line by itself, then another textscan call for the InFlow line w/ another 'headerline', 1 parameter, or just insert a call to fgetl between the two without 'headerline'.

サインインしてコメントする。

その他の回答 (2 件)

Joseph Cheng
Joseph Cheng 2014 年 7 月 14 日
編集済み: Joseph Cheng 2014 年 7 月 14 日

0 投票

dpb's solution is much more elegant but thought i'd put what i did so far.
fid = fopen('Data.txt');
nlines = 1;
dashes = [];time=[]; timeline=[];
dataInd = 1;
while 1
tline = fgetl(fid);
if ~ischar(tline),break,end
if ~isempty(tline)& length(tline)>=2
switch tline(2:3)
case '--'
dashes = [dashes nlines];
case 'Ti'
timeline = [timeline nlines];
tTime= sscanf(tline,' Time [T] %f');
if isempty(tTime),continue, end;
Data(dataInd).time = tTime;
dataInd = dataInd +1;
case 'Ar'
tArea= sscanf(tline,' Area [V] %f%f%f')';
Data(dataInd).Area = tArea;
case 'Vo'
tVolume= sscanf(tline,' Volume [V] %f%f%f')';
Data(dataInd).Area = tArea;
case 'hM'
thmean= sscanf(tline,' hMean [L] %f%f%f')';
Data(dataInd).hMean = thmean;
end
end
nlines = nlines+1;
end
D. Ali
D. Ali 2019 年 4 月 27 日

0 投票

I have similar question where I need to extarct all MCAP amples with time they occured on in separat file and plot if possilbe
I attached the file

質問済み:

2014 年 7 月 14 日

回答済み:

2019 年 4 月 27 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by