How to increase reading speed from a Gigabyte large file ?

2 ビュー (過去 30 日間)

古いコメントを表示

farzad 2019 年 6 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/467449-how-to-increase-reading-speed-from-a-gigabyte-large-file

コメント済み: farzad 2019 年 6 月 20 日

Hi all

how do I increase reading speed from an Excel file that contains rows and columns with a volume of some GigaBytes?

18 件のコメント
16 件の古いコメントを表示16 件の古いコメントを非表示

Walter Roberson 2019 年 6 月 18 日

編集済み: Walter Roberson 2019 年 6 月 18 日

I wrote out 1e6 by 50 of doubles = 4 gigabytes in binary form, and tested how long loading took.

When saved as space-delimited double using save -ascii -double, then using load() of the 12501000000 bytes of text file took 1416 seconds.

textscan() of that same file took 265 seconds.

fscanf() of the same file took 371 seconds.

When saved as a .csv file using dlmwrite() with precision 16, then using load() took 1107 seconds.

When saved as -v7.3 .mat, then using load() of the 3796914266 bytes of file took 25 seconds.

When saved as a pure binary file, then fread(fid, [1e6 500],'*double') took 14 1/4 seconds the first time, and 2.1 seconds the second time (file in operating system cache.) fread(fid, [1 inf], '*double') takes 4.6 seconds when the file is in operating system cache, which tells us that there is more memory management overhead when the size is unknown.

(I will update as I generate more times.)

dpb 2019 年 6 月 18 日

Yeah, that's kinda' what I suspected, thanks for confirming, Walter.

I still find it more than strange that there's 30% reduction over fscanf -- what are they doing wrong with it then is the question that there's that much room for improvement?

These timings couldn't possibly be related to caching issues, I presume; you're too careful for that! :)

farzad 2019 年 6 月 20 日