Textscan - how to handle blocks

Question

sas0701 2013 年 12 月 3 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/108466-textscan-how-to-handle-blocks

編集済み: sas0701 2013 年 12 月 3 日

Hello, I have a large .txt file with hundreds on lines that look like this..

"abcd_1" 54 22 0 0 215.00 584.70 382.10 . . . . . 1955606

The 'blocks' are differentiated based on the text in the first column.. i.e. blocks of 100s of lines will have the text abcd_1, then the next block (hundreds of lines) will be "yugh" etc..

What is the best way to input this data so I can manipulate it as follows. 1) Seperate by blocks. abcd_1 vs yugh vs etc..

2) Seperate by trials. Within each block there are 300 unique values (1 to 300) in column 2. BUT multiple rows for each. i.e. Value 22 can be on several rows. I need to seperate these so that each of the 300 values can be treated as a single trial.. i.e. all the rows belonging to Value 22 are 1 trial.

2) Seperate by condition - Column 3 has unique values (1 to 10). I need to sort the trials (column 2) by these conditions (column 3). i.e. Each condition (column 3) will have several trials (Column 2) associated with it.

I have tried importdata but it gets stuck in the scanning of the data (even though it previews it alright).

Help! S

2 件のコメント
なしを表示なしを非表示

sixwwwwww 2013 年 12 月 3 日

Can you upload small sample file or your data? maybe few tens of lines will be enough

sas0701 2013 年 12 月 3 日

here is a sample.txt

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

dpb 2013 年 12 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/108466-textscan-how-to-handle-blocks#answer_117138

MATLAB Online で開く

Well, you neglected to point out that the dots that presumed indicated existing but unshown columns are actually in the file as shown. ALWAYS point out any irregularity or unusual features instead of assuming others know what you do.

IFF the missing columns are all always where they are in the section shown and are always the same number, then

>> fmt=['%s' repmat('%f',1,2) repmat('%*s',1,5) repmat('%f',1,5)];
>> fid=fopen('test.txt','r');
>> c=textscan(fid,fmt,'collectoutput',true)
c = 
  {46x1 cell}    [46x7 double]
>> fid=fclose(fid);

works. If the position and/or number changes, then you'll have to parse the file on a line-by-line basis appropriately or know a priori how many lines of what format are in what order.

All in all, if you could generate the file using NaN or some other indicator for missing values you'll come out well ahead. The '.' is a poor choice as it is confused with the decimal point in real floating point numbers so can't use it as MV indicator.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

sas0701 2013 年 12 月 3 日

編集済み: sas0701 2013 年 12 月 3 日

Thank you!

サインインしてコメントする。

Answer 2

dpb 2013 年 12 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/108466-textscan-how-to-handle-blocks#answer_117132

編集済み: dpb 2013 年 12 月 3 日

MATLAB Online で開く

Read via textscan into cell array. Use 'collectoutput',1 to aggregate into a cell containing the text and another the numeric values.

Then cellfun and the various selection functions such as unique, intersect, ismember and friends will along with logical indexing will provide the tools to segregate/aggregate as desired.

Or, if have the Statistics Toolbox with the dataset object, consider it.

But, comments below notwithstanding, if simply ignore the ellipses in the data record,

>> s='"abcd_1" 54 22 0 0 215.00 584.70 382.10 1955606';
>> n=sum(s==' ');
>> fmt=['%s' repmat('%f',1,n)];
>> c=textscan(s,fmt,'collectoutput',true)
c = 
  {1x1 cell}    [1x8 double]
>>

If you know a priori the number of numeric columns/record, simply substitute that in place of n above--I just did a match for blanks to avoid having to manually count fields.

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

dpb 2013 年 12 月 3 日

No see any file...and DO_NOT attach a huge file--as another poster noted, simply a few lines at most will suffice.

If the file you opened above has more than a single record, then the problem is one of--

a) there's some anomaly at the end of the record, or b) the subsequent record(s) don't match the first, or c) some other similar problem in the format or file structure.

Again, w/ only incomplete information supplied, what's a responder to do?

sas0701 2013 年 12 月 3 日

sample.txt

Do you see file now?

サインインしてコメントする。

Textscan - how to handle blocks

2 件のコメント
なしを表示なしを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Textscan - how to handle blocks

2 件のコメント なしを表示なしを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (1 件)

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示