フィルターのクリア

Limit to Textscan?

7 ビュー (過去 30 日間)
Ying
Ying 2012 年 5 月 8 日
Hi all, I have been importing multiple data files (typically hundreds of files) quite successfully to Matlab using the textscan function.
Recently, my raw file format has changed (due to different data acquisition setup). Previously, I had one time column, and 20 data columns, and all columns were of the same length. But now, each data column has it's own time column (which do not line up with the other data), and the length of each data column is different from one another. I've made additions to my script so that it also reads in all the corresponding times for each data column, but I've discovered now for some reason, it doesn't read the whole file. It will read the file until about row 123, even though some columns go up to row 247, and some go up to 641. So I'm just curious if this is a limitation of the textscan function, or if the new code I added is funky.
  1 件のコメント
Oleg Komarov
Oleg Komarov 2012 年 5 月 9 日
Next time do not create additional answers since it became impossible to follow who's answered what and to collect all the info you supplied. Please use comments or/and edit your original answer.

サインインしてコメントする。

採用された回答

Geoff
Geoff 2012 年 5 月 9 日
Thanks for clarifying what your data looks like.
I assume that comma immediately after the '4' is a mistake. You could probably do this with a regexp... Because each comma denotes a pair of values. I take it that if the value before the comma is missing then the value after is also missing.
Do you have a fixed number of columns? If so, are the commas always there?
If at least the second condition above is true, then this isn't so bad... You can read pairs of values using regexp:
lines = {'1, 2 3, 4 5, 6'
'1, 2 3, 4 5, 6'
'1, 2 3, 4 , '
' , 3, 4 , '};
toks = regexp(lines, '\s*(\w*)\s*,\s*(\w*)', 'tokens');
This extracts word-like strings with optional spaces and the obligatory comma.
What you end up with is one cell per row, and within that one cell per pairing. You can manipulate this data as you see fit, convert empty strings or non-numbers to NaN, etc...
I dunno, that's the kind of solution I come up with when I don't want to spend too much time thinking up more complicated clever stuff.
[EDIT]
The above regexp fails on the fourth line because there's no logic that says if you have the first value you must have the second (and vice versa)... So try this:
toks = regexp(lines, '\s*(\w+)\s*,\s*(\w+)|\s*()\s*,\s*()', 'tokens');
rows = cell(size(toks));
for r = 1:numel(toks)
rows(r) = { str2double([toks{r}{:}]) };
end
Now you have a cell with one row per line, containing a vector of doubles...
This won't work with other rubbish in your data like % signs, but you can either filter that or allow for it in the regular expression....
And if course if you know that all your rows are the same length (or force them to be after processing), you can convert the whole rows array to a matrix with cell2mat

その他の回答 (7 件)

Geoff
Geoff 2012 年 5 月 8 日
I doubt there is a limit for the tiny numbers you're talking about.
What I expect has happened is that textread encountered some text that did not fit the format and was not listed as a possible delimiter.
Check your data file near the last line that you think was successfully parsed.
  3 件のコメント
Geoff
Geoff 2012 年 5 月 9 日
From your descriptions it's hard to envisage what your data looks like, and you haven't shown your textread() call. If you want your data in a matrix, then it has to be the width/height of your largest column and row number. If you want a variable width, you'll need to read into a cell array. I'd recommend using fgetl() with textread() on a per-line basis... Other functions worth checking out are sscanf(), regexp() or textscan().
Walter Roberson
Walter Roberson 2012 年 5 月 9 日
textread() is not recommended; it will be removed from MATLAB.
textscan() is its replacement.

サインインしてコメントする。


Walter Roberson
Walter Roberson 2012 年 5 月 9 日
MATLAB does not provide any facilities that can deal with reading field-wise from blocks of text of inconsistent number of fields. Not unless all of the fields are the same numeric format and everything is be read as one continuous stream ignoring line boundaries.
To read row-wise with inconsistent number of fields, one must read entire lines and parse them afterwards.

Ken Atwell
Ken Atwell 2012 年 5 月 9 日
That is an unusual file format. If I read you correctly, you have a file I would describe as "ragged down"... a consistent number of columns, but the number of rows per column is variable. Is that right? I'm assuming the columns are delimited with commas, tabs or such; something like (whitespace added):
11, 12, 13
21, , 23
, , 33
In this trivial example, textscan would stop processing at the first missing value (in the second row here). You can call textscan again with the same file handle and it will continue where it left off, but I image you will find it difficult to recover from the missing value.
Depending on the version of MATLAB you are using, I would try importing the file into MATLAB... it may just do right thing, and you can then generate a script from there to create a programmatic solution.
If that doesn't work out, another solution would be to read the file line-by-line, splitting on the delimiter (comma here). And, in this case, I want to convert from strings to doubles. Here is some code to import the data I've included here:
f = fopen('input.dat');
A=[];
while ~feof(f)
l = fgetl(f);
r = regexp(l, ',', 'split');
A(end+1,:) = str2double(r);
end
fclose (f);
A
Missing values are represented by NaNs in A.
  3 件のコメント
iffi
iffi 2012 年 12 月 27 日
f = fopen('input.dat');
A=[];
while ~feof(f)
l = fgetl(f);
r = regexp(l, ',', 'split');
A(end+1,:) = str2double(r);
end
fclose (f);
A this code read the file well but I have also some data in this form e.g V567,V1528,..
here this code also give me NaN for all such entries apart from missing values.
Walter Roberson
Walter Roberson 2012 年 12 月 27 日
It appears you are starting a new topic. Please create a new Question for this. You can refer to this existing topic as giving ideas.

サインインしてコメントする。


Ying
Ying 2012 年 5 月 9 日
Thanks for the responses, Ken, as for trying to import the file into Matlab, I could not do it successfully as I have multiple delimiters in my data. The data looks like the following:
1, 2 3, 4 5, 6
1, 2 3, 4 5, 6
1, 2 3, 4 ,
, 3, 4, ,
The data is weird in that commas separate the time and data column for one variable, and a space separates it from the next set of time/data. So in this example columns of 1, 3, and 5 are times, and 2,4,6 are the respective data that the times correspond to. And each set ends at different times. Right now my textscan always end at the shortest set (5,6) in this example. Is it possible to just change my delimiters so that it reads the whole file? Or should I try the line by line read option?
  2 件のコメント
Walter Roberson
Walter Roberson 2012 年 5 月 9 日
Are the columns fixed width? If they are not, there is logical difficulty in distinguishing between " 3" and "3 ".
Ying
Ying 2012 年 5 月 9 日
I don't know, I do know that it reads everything fine up to the shortest column though

サインインしてコメントする。


Ying
Ying 2012 年 5 月 9 日
That's correct Geoff, the comma after the 4 is a typo. The number of columns is somewhat fixed. What I mean is it's controllable, I can choose how many variables to track, however if I want more or less variables then I have to change the script to match that as well. The commas are always there, between the time and data that it matches to.
Oh, and since you asked earlier, this is my textscan line:
datanew = textscan(fid,'%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f%f','Delimiter','\t%,','HeaderLines',2);
So as you can see I have around 52 columns, not the most pretty or ideal way to do it, I know. I wanted to use import, but textscan seems to be the only way I've gotten it to work.
  5 件のコメント
Ying
Ying 2012 年 5 月 9 日
How would I account for header lines and column names?
Geoff
Geoff 2012 年 5 月 10 日
Read the first line and process it the same way. Are the headers separated by the same "comma-sometimes" strategy? You could use the same regexp code I gave you as long as a single header does not contain a space.

サインインしてコメントする。


per isakson
per isakson 2012 年 5 月 9 日
Does the data block of the file have a format something like this?
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space time_stamp, value space time_stamp, value
time_stamp, value space , time_stamp, value
"space" is that only char(32)? There isn't a tab, char(9)? The "time_stamp" does it have a special format that can be distinguished from "value"? Do the columns have fixed width, as in my example above?
If you how many header lines you can read them with fgetl or textscan.

Ying
Ying 2012 年 5 月 9 日
I think I was able to make it work by reading in all values as strings instead of floating numbers, and then making them all the same length, and use a str2num and converted the strings back to numbers. Now I just have to get it to work with the rest of the script.
  1 件のコメント
Geoff
Geoff 2012 年 5 月 10 日
Use str2double()

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeString Parsing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by