How to properly extract data from text file.

Question

Sharah 2017 年 5 月 11 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/339849-how-to-properly-extract-data-from-text-file

コメント済み: Guillaume 2017 年 5 月 12 日

I have a data in a text file that looks basically like this:

LineType: 1
PlayMode: Single
GameType: OneBalloon
LineType: SumR3
TranslationSpeed: 0
SensivityBalloon1: 0.09
SensivityBalloon2: 0
LevelLength: 20
Season: Summer
Backgrounddifficulty: Easy
StarScore[1] DistanceScore[1] StabilityScore[1] ScoreFrames[1] Frame[1] Time[1] ForcePlayer1[1] BalloonPath_X[1] BalloonPath_Y[1] CharacterPath_X[1] CharacterPath_Y[1] IsInactive[1] 
0 0 0 0 0 0 30653 0 4.225888 0 2.150741 0 
1 0 0 0 1 0 30641 0 -2.579402 0 -4.643577 0

And I am using this to extract data starting from the StarScore:

file = fullfile('file.txt');
Subject(1).T = readtable(file,'Delimiter',' ', ...
             'ReadVariableNames',true, 'HeaderLines', 10);
Subject(1).T(:, 13) = [];

Two questions I have:

1) The problem with this is that, the headerline should be at 11, but MATLAB extracted the first data as the header if I put HeaderLines to 11. It skips the first line. Why?

2) How to extract the first few information from the text file on a different cell and stop before it reaches starScore?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2017 年 5 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/339849-how-to-properly-extract-data-from-text-file#answer_266694

編集済み: dpb 2017 年 5 月 12 日

MATLAB Online で開く

The first one is (relatively) easy -- you set 'ReadVariableNames',true whose meaning per documentation is "the first row of the region to read contains the variable names for the table." Hence, the count of lines to skip is based on all the information to be parsed, not just the data portion; if you want the header line for names it becomes one of the data lines. So in that case 'HeaderLines' is just 10; you want the 11 th line.

I don't understand the second request, sorry...

ADDENDUM

OK, the second came to me (with some help from seeing Guillaume's Answer :) ). Another approach to same end result...

 hdrdata=regexp(textread(file,'%s','headerlines', 10, ...
                  'delimiter','\n','whitespace',''),'split');

which will leave you a 10x1 cell array each of which contains the text/value pair.

While TMW has deprecated the venerable textread over its uptown cousin textscan, it has some advantages for certain uses including that it accepts filename instead of needing file handle and that it doesn't encapsulate everything in a cell array. The above does return a cellstr array, but the textscan version returns that inside another cell that has to be dereferenced before being passed to regexp; another step.

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

dpb 2017 年 5 月 12 日

編集済み: dpb 2017 年 5 月 12 日

MATLAB Online で開く

No, because there aren't two blanks trailing but one (and even if were, they would be reduced from whatever number there were to just same one).

The "problem" is there is a 12 th delimiter in the record which indicates there are 13 fields; there just is no subsequent data for that field. It's no different than a .csv file being terminated by a trailing ','.

I see same effect in textscan --

>> T=textscan(fid,'','Delimiter',' ', ...
           'headerlines',11,'collectoutput',1)
T = 
  [2x13 double]
>>

Since the file is malformed, either

fix it (eliminate the trailing blank), or
clean up the input after read it.

Given it's such a trivial fix to eliminate column that is all NaN, that's probably the simplest thing to do.

ADDENDUM

Actually, the symptom is pretty common and comes from something like

[nr,nc]=size(array);  % size of array to write to file
fmt=[repmat('%f ',1,nc) '\n'];  % nc fields and newline
fprintf(fid,array.')            % write the array

While that seems superficially "the right stuff", by using the total number of columns in the repmat call on the numeric fields with the blank delimiter after each, you have created the Frankenstein you're now trying to read of the extra delimiter.

It's trivial to fix, however, assuming you have control over creating the file--if somebody else did it, then you have to have them fix it for you or deal with it after the fact, unfortunately.

All that's need is to just modify the format string just a wee bit...

fmt=[repmat('%f ',1,nc-1) '%f\n'];

where the delimiter is written for all the columns except the last, whose field is just the field string followed directly by the newline.

Guillaume 2017 年 5 月 12 日

MATLAB Online で開く

Alternatively,

fmt = strjoin(repmat({'%f'}, 1, nc), ' ');

サインインしてコメントする。

Answer 2

Guillaume 2017 年 5 月 12 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/339849-how-to-properly-extract-data-from-text-file#answer_266728

MATLAB Online で開く

dpb answered your first question.

For your second question, unfortunately it cannot be done with readtable. You have no option but to read the file a second time. This can be done many ways. A fairly simple way would be

fid = fopen(file, 'rt');
headerlines = 10;
headers = cell(headerlines, 2);  
for row = 1 : headerlines
  headers(row, :) = strsplit(fgetl(fid), ': ');
end
fclose(fid);

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

How to properly extract data from text file.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

How to properly extract data from text file.

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

6 件のコメント 4 件の古いコメントを表示4 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

6 件のコメント
4 件の古いコメントを表示4 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示