For a big matrix, how to accelerate fprintf?

11 ビュー (過去 30 日間)
Tian
Tian 2017 年 1 月 10 日
コメント済み: Scott Campbell 2022 年 12 月 7 日
Hello everyone, I have a 2500*1500 matrix and I want to print every column to a txt file, 5 numbers every row. Using :
for i=1:1500,
fprintf(fid, 'This is the %d coefficients\n', i);
S=sprintf(' %15.8E %15.8E %15.8E %15.8E %15.8E\n', coeff(:, i));
S(S=='E')='D';
fprintf(fid, '%s', S);
end
it will take several seconds. I'd like to know how can I accelerate this?
  3 件のコメント
Tian
Tian 2017 年 1 月 10 日
編集済み: Tian 2017 年 1 月 10 日
Appologize. I miss a '\n' in the first fprintf.
Actually I am constructing a formatted file that has already been accepted by many softwares, I have to add a Title line 'This is the %d coefficients' (just as an example), before printing each coeff(:,i).
Tian
Tian 2017 年 1 月 10 日
編集済み: Tian 2017 年 1 月 10 日
By 'writing in binary', do you mean use fprintf(fid, '%s', double(S)); instead of fprintf(fid, '%s', S);?
I just tried this and find that using fprintf(fid, '%s', double(S)); spent more than doubled time.
If I use fopen('test.txt', 'wb') instead of fopen('test.txt', 'w'), the time required is the same.
If I misunderstood your suggestion, please let me know. Thank you~

サインインしてコメントする。

採用された回答

Walter Roberson
Walter Roberson 2017 年 1 月 10 日
You have a few different speed constraints
  • the speed of formatting individual numeric items, but you are already using the fastest way
  • the overhead of calling fprintf() and sprintf() multiple times, which could potentially be reduced by formatting everything at one time and then writing it all
  • the cost of doing the substitution of 'E' to 'D', which possibly could be done more efficient (but your current version looks pretty good as-is)
  • the overhead of doing the substitution multiple times, which could potentially be reduced by building the output matrix and then doing the substitution all at once.
  • the cost of writing to disk, which you cannot get away from (except to touch up the buffering strategy, perhaps, as Jan shows)
You are not calling sprintf() irresponsibly such as with just one value at a time, so it is not obvious that there is a lot of overhead that could be cut by formatting everything at once.
Formatting everything at once is possible, but it drives up your memory costs a fair bit, to the point where you have to question whether the memory allocation costs of the large arrays are going to exceed the savings in overhead of calling sprintf() less often. Especially when you make the adjustments needed for your not always having a multiple of 5 items per column to display.
My tests show that regexprep() is roughly 16 times slower than your existing S(S=='E')='D' so you probably would have difficulty being more efficient on that portion.
With you already having cut down on overheads, and being stuck with the numeric formatting time and the file I/O time, I think you are already approaching as fast as you can reasonably get for that output format.
  1 件のコメント
Tian
Tian 2017 年 1 月 11 日
Thanks a lot for your detailed explanation. That's very helpful.

サインインしてコメントする。

その他の回答 (1 件)

Jan
Jan 2017 年 1 月 10 日
This could be slightly faster:
fid = fopen(FileName, 'W'); % Uppercase W for better buffering
if fid == -1
error('Cannot open file for writing: %s', FileName);
end
for i = 1:1500,
fprintf(fid, 'This is the %d coefficients\n', i);
S = sprintf(' %15.8E %15.8E %15.8E %15.8E %15.8E\n', coeff(:, i));
fwrite(fid, strrep(S, 'E', 'D'), 'char');
end
But I assume the bottleneck is the slow disk transfer. The 'W' can reduce this, using an SSD would be better.
  2 件のコメント
Tian
Tian 2017 年 1 月 11 日
Thanks. I'd like to try your method
Scott Campbell
Scott Campbell 2022 年 12 月 7 日
My 15 Mb csv file went from 30 to 10 seconds.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeStartup and Shutdown についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by