Good Morning,
this line apparently takes 21s to run:
s = size(xlsread(filename, 1, 'F:F'))
It is supposed to count the number of rows of an Excel sheet. It is actually a .csv data file, but the values are not separated by commas but by semicolons, so I cannot use the Matlab function ''csvread''. Is there maybe a faster or more elegant way to do this? And is it the ''xlsread'' oder the ''size'' that takes so long...?
Thank you in advance

2 件のコメント

Guillaume
Guillaume 2018 年 4 月 24 日
編集済み: Guillaume 2018 年 4 月 24 日

xlsread needs to

  • start excel
  • get excel to read the file
  • parse the text to convert it into numbers
  • read the data from the worksheet

The first three operations in particular are going to take some time. Excel is not a lightweight program and reading data from disk always takes time.

All that just to get the number of rows in a text file?

beginner94
beginner94 2018 年 4 月 27 日
I needed the number of rows to know how many loops were necessary to get through the file and then I want to transmit the content to a matrix

サインインしてコメントする。

 採用された回答

Guillaume
Guillaume 2018 年 4 月 24 日

1 投票

If all you want to do is count the number of lines in the file, then parsing it is a waste of time:
numlines = sum(fileread('modified.csv') == char(10))
is probably the fastest way to get the number of lines.
If you do want to actually access the content of the file, then readtable is probably the easiest way to get it. This does not involve transitioning through excel so should be much faster than xlsread. readtable can usually detect the file format on its own. If not you can give it hints.

20 件のコメント

Stephen23
Stephen23 2018 年 4 月 24 日
編集済み: Stephen23 2018 年 4 月 24 日
+1 Neat and simple. Needs some adjustment to account for an optional trailing newline character, which not all files have but this one does.
beginner94
beginner94 2018 年 4 月 27 日
Thank you! It doesn't even need one second anymore to run, this makes a huge difference! I only needed to substract the lines with characters, because ''fileread'' also reads the non numeric lines. I couldn't find out what '' ==char(10)'' does though...
Stephen23
Stephen23 2018 年 4 月 27 日
編集済み: Stephen23 2018 年 4 月 27 日
"I couldn't find out what '' ==char(10)'' does though"
Returns a logical index of the locations all line feed / newline characters. See:
Guillaume
Guillaume 2018 年 4 月 27 日
編集済み: Guillaume 2018 年 4 月 27 日
Yes, char(10) is the linefeed character (\n) that marks the end of line on all platforms supported by w̶i̶n̶d̶o̶w̶s̶ MATLAB. So the code simply counts the number of end of line characters in the text file.
beginner94
beginner94 2018 年 4 月 27 日
Ahh, I see! Thank you both!
Stephen23
Stephen23 2018 年 4 月 27 日
Perhaps "on all platforms supported by MATLAB" ?
Guillaume
Guillaume 2018 年 4 月 27 日
@Stephen, Obviously my fingers work faster than my brain!
beginner94
beginner94 2018 年 4 月 27 日
Is there an equivalent function to ''readtable'' in R2007b?
Guillaume
Guillaume 2018 年 4 月 28 日
Wow, that's an ancient version!
No, you'll have to make do with csvread, dlmread, or, on Windows, xlsread.
beginner94
beginner94 2018 年 5 月 1 日
Does this mean that if I want to put the data from the Excel sheet into a matrix it will last an eternity anyhow, because I can only do it with xlsread? :/ whats more, I got about 400 of those sheets that need to be evaluated
Guillaume
Guillaume 2018 年 5 月 1 日
If your file is a .csv file, then it is not an excel sheet and using xlsread for that is overkill and slow.
dlmread should read these files much faster.
beginner94
beginner94 2018 年 5 月 3 日
dlmread(filename, ';', 4, 4)
Is not working
This is the error I get:
??? Error using ==> textscan Mismatch between file and format string. Trouble reading number from file (row 1, field 6) ==> ,3700
Error in ==> read_csv at 17 dlmread(filename, ';', 4, 4)
Guillaume
Guillaume 2018 年 5 月 3 日
Simplest: upgrade to a recent version. You're missing out on 11 years worth of improvements (@ 2 versions/year)
There's two things in your file that dlmread and co. can't cope with:
  • the date and time
  • the , as a decimal separator.
In that case, you need to go more low level, using textscan. Unfortunately, because of the , as decimal separator all these temperatures have to be read as text then converted to number afterward:
fid = fopen('modified.csv', 'rt');
content = textscan(fid, ['%d%{dd.MM.yyyyy}D%{hh:mm:ss}T', repmat('%s', 1, 20)], 'Delimiter', ';', 'HeaderLines', 3);
fclose(fid);
content(4:end) = cellfun(@(s) str2double(strrep(s, ',', '.')), content(4:end), 'UniformOutput', false);
Bear in mind that the format string above works in R2018b but I have no idea if it's valid in R2007b, particularly the date and time part.
beginner94
beginner94 2018 年 5 月 4 日
I would love to upgrade but the person I am writing the code for has only the 2007 version, so this is not really a solution... I tried the code @Guillaume but as you already guessed the format didn't work in R2007b. What I tried is this:
fid = fopen(filename, 'rt');
fmt = ['%s %s %s', repmat('%s',1,8)];
content = textscan(fid , fmt, 'Delimiter', ';', 'HeaderLines', 3);
fclose(fid);
content(4:end) = cellfun(@(s) str2double(strrep(s, ',', '.')), content(4:end), 'UniformOutput', false);
content{:}
and what came out was this:
[...]
2.9300
NaN
NaN
2.8700
NaN
NaN
2.8700
NaN
NaN
2.8700
NaN
NaN
But apart from all the ''NaN''s I cannot explain I wouldn't know how to access the values. Maybe there is a way to work around this with ''fileread'' or I will need to evaluate the data with Excel.
Thank you for your efforts and patience!
Guillaume
Guillaume 2018 年 5 月 4 日
I don't understand why you cut the number of '%s' down from 20 to 8. You need to read the exact number of tokens per line. The following format string works for me:
fid = fopen(filename, 'rt');
fmt = ['%d', repmat('%s', 1, 22)]; %date + time + 20 temperatures
content = textscan(fid , fmt, 'Delimiter', ';', 'HeaderLines', 3);
fclose(fid);
content(4:end) = cellfun(@(s) str2double(strrep(s, ',', '.')), content(4:end), 'UniformOutput', false);
temperature = cell2mat(content(4:end));
Note that T9 to T20 end up all NaN since the columns are blank in the file, but you still have to tell textscan to read these columns.
You can trim these nan columns afterward:
temperature = temperature(:, ~all(isnan(temperature)))
beginner94
beginner94 2018 年 5 月 7 日
This works! Thank you VERY much
Walter Roberson
Walter Roberson 2018 年 5 月 7 日
編集済み: Walter Roberson 2018 年 5 月 7 日
You can use %*s for columns you need to read (to get to end of line) but you do not want to store.
fmt = ['%d', repmat('%s', 1, 10), repmat('%*s', 1, 12)]; %date + time + 20 temperatures, the last 12 of which are skipped
beginner94
beginner94 2018 年 5 月 7 日
@Guillaume, which changes do I need to do to read the date and time columns? This is not going to work until I found out how the format for time and date is for R2007b, right?
Walter Roberson
Walter Roberson 2018 年 5 月 7 日
ds = strcat(column{2}, {' '}, column{3})
and then you can use datenum() on the date string to decode into serial date numbers.
beginner94
beginner94 2018 年 5 月 11 日
編集済み: beginner94 2018 年 5 月 11 日
@Walter Roberson
This doesn't solve the problem unfortunately because I don't know yet how to specify the time/date format. For now I only get columns with "NaN"... Is there maybe a way to search for the colons in the time column with "strfind" and then return the hours, minutes and seconds seperately and put them together afterwards?

サインインしてコメントする。

その他の回答 (1 件)

Walter Roberson
Walter Roberson 2018 年 5 月 11 日

1 投票

S = fileread('modified.csv');
S(S==',') = '.'; %file uses comma for decimal point
fmt = ['%f%s%s', repmat('%f', 1, 8)];
datacell = textscan(S, fmt, 'HeaderLines', 3, 'Delimiter', ';', 'MultipleDelim', true, 'CollectOutput', true);
ds = strcat(datacell{2}(:,1), {' '}, datacell{2}(:,2));
dates = datenum(ds, 'dd.mm.yyyy HH:MM:SS');
all_data = [datacell{1}, dates, datacell{3}];
Now all_data is an all-numeric array, 10 columns wide, in which the second column is the serial date number.

1 件のコメント

beginner94
beginner94 2018 年 5 月 12 日
Thank you!! This doesn't exactly work, probably because of my ancient Matlab version but I managed to adapt it and now it works fine. Thank you

サインインしてコメントする。

質問済み:

2018 年 4 月 24 日

コメント済み:

2018 年 5 月 12 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by