textscan failing to read data in text file

Question

UniqueWorldline 2017 年 10 月 15 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/361377-textscan-failing-to-read-data-in-text-file

コメント済み: Cedric 2017 年 10 月 21 日

採用された回答: Cedric

MATLAB Online で開く

I have a text file with a fileID called fidRawData that contains rows that look like this:

A BCD 99.9 9.90 9.999 99.9 0.999 0.99 9.999 9.999 99.99 99.9 9.9

A can be one of two characters ('A' or 'B'), or it can be empty (a space is inserted in its place, leaving white space at the beginning of the row). The status of this first character can vary by row. BCD is a three letter code than can vary depending on the row. The subsequent columns of numbers I want to consider as being as general as possible, but none of them will ever get large. They should all be between -9999 and 9999.

Sometimes an error occurs and

---

is inserted in place of some of the numbers in a given row like this:

A BCD 99.9 9.90 9.999 --- --- 0.99 9.999 9.999 99.99 99.9 9.9

The only thing I can really be sure of is that there will always be one space between the columns. There may be more than one space. The numbers can vary depending on if they are positive or negative, where the decimal point is, and how large or small they are.

I need to use either textscan or fscanf (I would prefer to use textscan for its greater flexibility) to store all the data in each of these columns (including the textual information in the first two columns) in whatever data type will accept such a diverse range of simpler data types and allow me to easily retrieve the data.

Whenever and 'A' is omitted, and a ' ' is put in its place, I am ok with an 'N' or other character taking its place if need be, but if there is an 'A' or a 'B', I want that stored as 'A' or 'B' respectively.

When an '---' shows up, I want to replace that with NAN, an empty location in the data structure, or some other indication that there is no data available.

I tried the following command on a singular row where there was an 'A' at the beginning of the row and no '---' were in the row:

rawData = textscan(fidRawData, '%s %s %f %f %f %f %f %f %f %f %f %f)

This command worked as expected. It returned a 1x14 cell array where all the values in the text file were stored as I wanted in rawData.

But there are plenty of rows without and 'A' or 'B' and '---' is present at least once in the row. In order to try and address these variations, I tried the following on a row where both conditions are true:

rawData = textscan(fidRawData, '%s %s %f %f %f %f %f %f %f %f %f %f %f %f,'Delimiter',' ','EmptyValue',0)

This test results in a 1x14 cell array that is completely empty. The cells are either 1x1 cell type cells and contain a 0x0 char array, or they are 0x1 double cells.

rawData = textscan(fidRawData, '%s %s %f %f %f %f %f %f %f %f %f %f)

worked up until it hit the '---' in the row, then began returning 0x1 double cells for the remaining columns of rawData.

What can I do to get textscan to deal with these possibilities?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Cedric 2017 年 10 月 15 日

2
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/361377-textscan-failing-to-read-data-in-text-file#answer_285825

編集済み: Cedric 2017 年 10 月 15 日

MATLAB Online で開く

data.txt

Here is one way. We pre-process the content before parsing, adding 'N' where the first letter is missing. Then we count the number of columns, split the content on white spaces, and reshape the output according to the number of columns. Finally we extract the header (or those first two char columns) and convert the rest to double.

content = fileread( 'data.txt' ) ;
content = regexprep( content, '^\s', 'N ', 'lineanchors' ) ;
nCols   = numel( strsplit( regexp( content, '[^\r\n]+', 'match', 'once' ), ' ')) ;
data    = reshape( regexp(content, '\s+', 'split'), nCols, [] ).' ;
header  = data(:,1:2) ;
data    = str2double( data(:,3:end) ) ;

Applied to the file attached, we get:

 >> header
 header =
  5×2 cell array
    {'A'}    {'BCD'}
    {'B'}    {'BCD'}
    {'N'}    {'BCD'}
    {'B'}    {'BCD'}
    {'N'}    {'BCD'}
 >> data
 data =
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990   99.9000    0.9990    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000
   99.9000    9.9000    9.9990       NaN       NaN    0.9900    9.9990    9.9990   99.9900   99.9000    9.9000

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

UniqueWorldline 2017 年 10 月 21 日

Thank you very much @Cedric Wannaz. I may have some follow up questions that I will ask in a new thread that references this question in a link, but your code has solved 99% of my problems analyzing this data.

Cedric 2017 年 10 月 21 日

My pleasure!

サインインしてコメントする。

Answer 2

Walter Roberson 2017 年 10 月 17 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/361377-textscan-failing-to-read-data-in-text-file#answer_286302

MATLAB Online で開く

In the case where you already know the number of numeric columns (perhaps having parsed the file the way Cedric shows), then there is a trick you can use:

S = 'A BCD   99.9   9.90 9.999 99.9 0.999  0.99  9.999  9.999  99.99 99.9  9.9';  %sample input
S1 = '  BCD   99.9   9.90 9.999 99.9 ---  ---  9.999  9.999  --- 99.9  9.9';  %another sample input. Leading space is important
NumNumeric = 11;
SP = '%*[ ]';
fmt = ['%c', SP, '%s', repmat([SP '%f'], 1, NumNumeric)];
textscan(S, fmt, 'treatasempty', '---', 'whitespace','')
textscan(S1, fmt, 'treatasempty', '---', 'whitespace','')

These give

ans =
  1×13 cell array
    {'A'}    {'BCD'}    {[99.9]}    {[9.9]}    {[9.999]}    {[99.9]}    {[NaN]}    {[NaN]}    {[9.999]}    {[9.999]}    {[NaN]}    {[99.9]}    {[9.9]}
ans =
  1×13 cell array
    {' '}    {'BCD'}    {[99.9]}    {[9.9]}    {[9.999]}    {[99.9]}    {[NaN]}    {[NaN]}    {[9.999]}    {[9.999]}    {[NaN]}    {[99.9]}    {[9.9]}

This approach does not require pre-processing to replace missing leading character.

I show here scanning from a string; you can fopen() the file and pass the file identifier where I show the string.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Cedric 2017 年 10 月 21 日

編集済み: Cedric 2017 年 10 月 21 日

Neat, I had forgotten about it!

サインインしてコメントする。

textscan failing to read data in text file

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

textscan failing to read data in text file

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示