Limit to Textscan?

Question

Ying 2012 年 5 月 8 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan

Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.

Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Oleg Komarov 2012 年 5 月 9 日

Next time do not create additional answers since it became impossible to follow who's answered what and to collect all the info you supplied. Please use comments or/and edit your original answer.

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Geoff 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47172

MATLAB Online で開く

Thanks for clarifying what your data looks like.

I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.

Do you have a fixed number of columns? If so, are the commas always there?

If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:

lines = {'1, 2  3, 4  5,  6'
         '1, 2  3, 4  5,  6'
         '1, 2  3, 4   ,  '
         ' ,    3, 4  , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');

This extracts word-like strings with optional spaces and the obligatory comma.

What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...

I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.

[EDIT]

The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:

toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
  rows(r) = { str2double([toks{r}{:}]) };
end

Now you have a cell with one row per line, containing a vector of doubles...

This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....

And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

Geoff 2012 年 5 月 8 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47154

I doubt there is a limit for the tiny numbers you're talking about.

What I expect has happened is that textread encountered some text that did not fit the format and was not listed as a possible delimiter.

Check your data file near the last line that you think was successfully parsed.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Geoff 2012 年 5 月 9 日

From your descriptions it's hard to envisage what your data looks like, and you haven't shown your textread() call. If you want your data in a matrix, then it has to be the width/height of your largest column and row number. If you want a variable width, you'll need to read into a cell array. I'd recommend using fgetl() with textread() on a per-line basis... Other functions worth checking out are sscanf(), regexp() or textscan().

Walter Roberson 2012 年 5 月 9 日

textread() is not recommended; it will be removed from MATLAB.

textscan() is its replacement.

サインインしてコメントする。

Answer 3

Walter Roberson 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47159

MATLAB does not provide any facilities that can deal with reading field-wise from blocks of text of inconsistent number of fields. Not unless all of the fields are the same numeric format and everything is be read as one continuous stream ignoring line boundaries.

To read row-wise with inconsistent number of fields, one must read entire lines and parse them afterwards.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 4

Ken Atwell 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47160

MATLAB Online で開く

That is an unusual file format. If I read you correctly, you have a file I would describe as "ragged down"... a consistent number of columns, but the number of rows per column is variable. Is that right? I'm assuming the columns are delimited with commas, tabs or such; something like (whitespace added):

 11, 12, 13
 21,   , 23
 ,   , 33

In this trivial example, textscan would stop processing at the first missing value (in the second row here). You can call textscan again with the same file handle and it will continue where it left off, but I image you will find it difficult to recover from the missing value.

Depending on the version of MATLAB you are using, I would try importing the file into MATLAB... it may just do right thing, and you can then generate a script from there to create a programmatic solution.

If that doesn't work out, another solution would be to read the file line-by-line, splitting on the delimiter (comma here). And, in this case, I want to convert from strings to doubles. Here is some code to import the data I've included here:

 f = fopen('input.dat');
 A=[];
 while ~feof(f)
    l = fgetl(f);
    r = regexp(l, ',', 'split');
    A(end+1,:) = str2double(r);
 end
 fclose (f);
 A

Missing values are represented by NaNs in A.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

iffi 2012 年 12 月 27 日

MATLAB Online で開く

 f = fopen('input.dat');
 A=[];
 while ~feof(f)
    l = fgetl(f);
    r = regexp(l, ',', 'split');
    A(end+1,:) = str2double(r);
 end
 fclose (f);
 A this code read the file well but I have also some data in this form e.g V567,V1528,.. 
here this code also give me NaN for all such entries apart from missing values.

Walter Roberson 2012 年 12 月 27 日

It appears you are starting a new topic. Please create a new Question for this. You can refer to this existing topic as giving ideas.

サインインしてコメントする。

Answer 5

Ying 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47166

MATLAB Online で開く

Thanks for the responses, Ken, as for trying to import the file into Matlab, I could not do it successfully as I have multiple delimiters in my data. The data looks like the following:

 1, 2  3, 4  5,  6
 1, 2  3, 4  5,  6
 1, 2  3, 4   ,  
  ,    3, 4,  ,

The data is weird in that commas separate the time and data column for one variable, and a space separates it from the next set of time/data. So in this example columns of 1, 3, and 5 are times, and 2,4,6 are the respective data that the times correspond to. And each set ends at different times. Right now my textscan always end at the shortest set (5,6) in this example. Is it possible to just change my delimiters so that it reads the whole file? Or should I try the line by line read option?

2 件のコメント
なしを表示なしを非表示

Walter Roberson 2012 年 5 月 9 日

Are the columns fixed width? If they are not, there is logical difficulty in distinguishing between " 3" and "3 ".

Ying 2012 年 5 月 9 日

I don't know, I do know that it reads everything fine up to the shortest column though

サインインしてコメントする。

Answer 6

Ying 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47173

MATLAB Online で開く

That's correct Geoff, the comma after the 4 is a typo. The number of columns is somewhat fixed. What I mean is it's controllable, I can choose how many variables to track, however if I want more or less variables then I have to change the script to match that as well. The commas are always there, between the time and data that it matches to.

Oh, and since you asked earlier, this is my textscan line:

 datanew = textscan(fid,'%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','Delimiter','\t%,','HeaderLines',2);

So as you can see I have around 52 columns, not the most pretty or ideal way to do it, I know. I wanted to use import, but textscan seems to be the only way I've gotten it to work.

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

Ying 2012 年 5 月 9 日

How would I account for header lines and column names?

Geoff 2012 年 5 月 10 日

Read the first line and process it the same way. Are the headers separated by the same "comma-sometimes" strategy? You could use the same regexp code I gave you as long as a single header does not contain a space.

サインインしてコメントする。

Answer 7

per isakson 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47271

MATLAB Online で開く

Does the data block of the file have a format something like this?

    time_stamp, value space  time_stamp, value space  time_stamp, value  
    time_stamp, value space  time_stamp, value space  time_stamp, value  
    time_stamp, value space  time_stamp, value space  time_stamp, value  
    time_stamp, value space            ,              time_stamp, value

"space" is that only char(32)? There isn't a tab, char(9)? The "time_stamp" does it have a special format that can be distinguished from "value"? Do the columns have fixed width, as in my example above?

If you how many header lines you can read them with fgetl or textscan.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 8

Ying 2012 年 5 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/37815-limit-to-textscan#answer_47281

I think I was able to make it work by reading in all values as strings instead of floating numbers, and then making them all the same length, and use a str2num and converted the strings back to numbers. Now I just have to get it to work with the rest of the script.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Geoff 2012 年 5 月 10 日

Use str2double()

サインインしてコメントする。

Limit to Textscan?

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (7 件)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Limit to Textscan?

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (7 件)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

2 件のコメント なしを表示なしを非表示

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示