How can I speed up importing large .d files?

2 ビュー (過去 30 日間)
Felix Lauwaert
Felix Lauwaert 2015 年 8 月 1 日
編集済み: Cedric 2015 年 8 月 1 日
Hi,
soon I'll have large files, about 50000x6 (more than one of them) and I'll have to work on them several times to plot different stuff. I've realised that "importdata" command is very slow and I've been recommended to use GNUPlot, but I would prefer to stick to MATLAB if possible. So, is there any command to load data to workspace in a faster way? Maybe using .xls instead of .d?
Thanks.
  2 件のコメント
Cedric
Cedric 2015 年 8 月 1 日
Could you provide a sample file?
Felix Lauwaert
Felix Lauwaert 2015 年 8 月 1 日
I had to upload it as .txt because of website requirements. I tic-tocked the importdata and it was arround 12s, but I'll have to deal with way larger files soon.

サインインしてコメントする。

採用された回答

Cedric
Cedric 2015 年 8 月 1 日
編集済み: Cedric 2015 年 8 月 1 日
I would do something along the following line:
buffer = fileread( 'test.txt' ) ;
data = sscanf( buffer(76:end), '%f' ) ;
data = reshape( data, 6, [] )' ;
I built a file with more than 50k rows to test, and it takes half the time of IMPORTDATA. We can easily improve it so it doesn't rely on a fixed header size. You could also do something like:
fName = 'test.txt' ;
fId = fopen( fName, 'r' ) ;
header = strsplit( strtrim( fgetl( fId ) ), ' ' ) ;
data = fscanf( fId, '%f' ) ;
data = reshape( data, 6, [] )' ;
fclose( fId ) ;
but this is slower.
PS: I don't understand why your IMPORTDATA is that slow. On my test file with >50k lines, here is the timing:
IMPORTDATA: Elapsed time is 0.312664 seconds.
FSCANF : Elapsed time is 0.457961 seconds.
FILEREAD : Elapsed time is 0.171987 seconds.
  5 件のコメント
per isakson
per isakson 2015 年 8 月 1 日
編集済み: per isakson 2015 年 8 月 1 日
I created a test file with
cssm0(50000)
where
function cssm0( N )
h = sprintf( 'H%06d ', 1:N );
d = sprintf( '%f ' , 1:N );
fid = fopen('test_long_rows.txt','w');
fprintf( fid, '%s\n', h,d,d,d,d,d,d );
fclose(fid);
end
and used profile. Maybe, I misread the original question. Anyhow, whether the rows or the columns are long and short, respectively, makes a huge difference with importdata
Cedric
Cedric 2015 年 8 月 1 日
編集済み: Cedric 2015 年 8 月 1 日
It's interesting, 6x5e4 -> 2s and 5e4x6 -> 0.3s.
In any case, I've always been avoiding IMPORTDATA like plague (especially after looking at its source code), because its behavior is size-dependent and difficult to predict. This leads to situations like the one reported recently on the forum, where it works with a files that contains thousands of rows of data, but fails when there are only 30 lines (for the same data structure).

サインインしてコメントする。

その他の回答 (1 件)

per isakson
per isakson 2015 年 8 月 1 日
編集済み: per isakson 2015 年 8 月 1 日
Does the file consist of header rows followed by data rows, which contains only numerical data? (No string data such as date time data.)
Try txt2mat, by Andres "txt2mat basically is a wrapper for sscanf, it quickly converts ascii files containing m-by-n numeric data, allowing for header lines"
  3 件のコメント
per isakson
per isakson 2015 年 8 月 1 日
Yes, please upload the six row file.
Felix Lauwaert
Felix Lauwaert 2015 年 8 月 1 日
I tried txt2mat out and it's great, timing 0.157s! Awesome, I hope one day I'll be ready to answer questions and make such functions :)

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLarge Files and Big Data についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by