Huge number of iterations

18 ビュー (過去 30 日間)
Qammar Abbas
Qammar Abbas 2021 年 9 月 20 日
編集済み: Qammar Abbas 2021 年 9 月 22 日
Hi Community members,
I am generating chemical formulas of compounds by forming combinations of elements and storing them in text file. The total number of combinations according to my calculations come out to be 18,217,382,400 i.e. i need 18,217,382,400 number of for loop iterations. I want to do this as quicky as possible. Please suggest an efficient method for doing this. I have tried both for and parfor, they take too long. A snippet of my code is shown below. I am using 2 workers and the code has been running for more than 24 hours now. How can I improve speed?
fcn = @() fopen( sprintf( 'chem_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
iterations=[length(a) length(b)]; % a and b are cell arrays. Length of a is 10944 length of b is 1664600
tic
parfor ix=1:prod(iterations)
ix
[d,e]=ind2sub(iterations,ix);
fprintf(w.Value, '%s\n', strcat(a{d},b{e}));
end
toc
clear w;
  6 件のコメント
Qammar Abbas
Qammar Abbas 2021 年 9 月 21 日
This is something I can't share. However, I can tell you that it is a necessary requirement.
Rik
Rik 2021 年 9 月 21 日
Then you should probably consider buying computation time on some sort of cluster. If you don't tell us what you want to do, we can't suggest a way to avoid some of the computational work. Things take time. Sometimes the most efficient way is to reduce the number of things.

サインインしてコメントする。

回答 (1 件)

Walter Roberson
Walter Roberson 2021 年 9 月 20 日
fcn = @() fopen( sprintf( 'chem_%d.txt', labindex ), 'wt' );
w = WorkerObjWrapper( fcn, {}, @fclose );
% a and b are cell arrays. Length of a is 10944 length of b is 1664600
b = b(:);
tic
iterations = length(a);
parfor ix=1:iterations
outs = strjoin(strcat(a(ix), b, {newline})); %a(ix) is deliberate in case a{ix} has whitespace
fwrite(w.Value, outs);
end
toc
  4 件のコメント
Walter Roberson
Walter Roberson 2021 年 9 月 22 日
Huh. I really expected the fprintf version would be slower !
Notice that I build the fprintf format dynamically to include the current content from a . I assumed here that a does not contain any % characters.
NA = 100;
NB = 10000;
letters = ['A':'Z', '0':'9']; nlet = length(letters);
maxword = 5;
a = arrayfun(@(L) letters(randi(nlet, 1, L)), randi([1, maxword], 1, NA), 'uniform', 0);
b = arrayfun(@(L) letters(randi(nlet, 1, L)), randi([1, maxword], 1, NB), 'uniform', 0);
tn = tempname();
cleanME = onCleanup(@() delete(tn));
t1 = timeit(@() use_fprintf(tn, a, b), 0);
use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000 use_fprintf bytes = 7240000
t2 = timeit(@() use_strjoin(tn, a, b), 0);
use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000 use_strjoin bytes = 7240000
t3 = timeit(@() use_horzcat(tn, a, b), 0);
use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000 use_horzcat bytes = 7240000
struct('fprintf', t1, 'strjoin', t2, 'horzcat', t3)
ans = struct with fields:
fprintf: 0.5896 strjoin: 2.6799 horzcat: 2.4606
function use_fprintf(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
fmt = sprintf('%s%%s\\n', a{K});
fprintf(fid, fmt, b{:});
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_fprintf bytes = %d\n', dinfo.bytes);
end
function use_strjoin(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
outs = strjoin(strcat(a(K), b, {newline}), '');
fwrite(fid, outs);
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_strjoin bytes = %d\n', dinfo.bytes);
end
function use_horzcat(tn, a, b)
fid = fopen(tn, 'w');
for K = 1 : length(a)
temp = strcat(a(K), b, {newline});
outs = [temp{:}];
fwrite(fid, outs);
end
fclose(fid);
dinfo = dir(tn);
fprintf('use_horzcat bytes = %d\n', dinfo.bytes);
end
Qammar Abbas
Qammar Abbas 2021 年 9 月 22 日
編集済み: Qammar Abbas 2021 年 9 月 22 日
I have tried your first code and as @Benjamin explained, indeed it is a very good solution to my problem. However, I observed that the execution time further reduces if we use 'for' instead of 'parfor' in your first code. According to my calculation, I need maximum of 2 days to generate all 18,217,382,400 combinations using for loop. I have started running the code and will get back to you with the results in 2-3 days hopefully. Meanwhile, I am trying to understand the second code you have shared. I am thankful for your help.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLoops and Conditional Statements についてさらに検索

製品


リリース

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by