Why is sum considerably slower that adding each individual element together, when using large loops?
13 ビュー (過去 30 日間)
古いコメントを表示
I have some code that includes loops that run at least 1e7 times (often up to 1e9 times) and I am trying to improve its performance.
My slowest step, somewhat surprisingly, is indexing into a variable and summing those numbers. I need to do this index and sum every time within the loop as the variable changes within the loop too.
The code below is a simplification of the problem.
Method 1
a = 1:16;
idx = [1 6 11 16];
N = 1e7;
A0 = zeros(1,N);
tic
for k = 1:N
A = sum(a(idx));
a = a + 0.001;
A0(k) = A;
end
toc
Method 2
If I instead replace the first line of the loop with a nested loop that does an iterative addition, then I get a 10x improvement in performance. However, I am not too keen on using another loop, as in my actual code, I have another couple of loops too, so my code will begin to get even messier.
a = 1:16;
B0 = zeros(1,N);
tic
for k = 1:N
B = 0;
for j = 1:numel(idx)
B = B + a(idx(j));
end
a = a + 0.001;
B0(k) = B;
end
toc
Method 3
I get a smaller further improvement if I spell out the individual components of a instead and remove the loop.
a = 1:16;
C0 = zeros(1,N);
tic
for k = 1:N
C = a(1)+a(6)+a(11)+a(16);
a = a + 0.001;
C0(k) = C;
end
toc
However, I need to keep the code as general as impossible - the index idx is dependent on the initial input, so this 3rd option is not possible.
My question is, is there a faster way than method 1, but that is also cleaner / more concise that method 2?
Curiously, the differences in time between Methods 1 and Methods 2 and 3 get progressively worse as N is increased.
0 件のコメント
採用された回答
Matt J
2023 年 8 月 31 日
編集済み: Matt J
2023 年 8 月 31 日
Yes, unfortunately, vector indexing is quite an expensive operation. If idx is constant, as in your example, than you should do as below. If idx is not constant, then the way to optimize the code will depend on how idx varies.
a = 1:16;
idx = [1 6 11 16];
N = 1e7;
A0 = zeros(1,N);
b=a(idx);
tic
for k = 1:N
A = sum(b);
b = b + 0.001;
A0(k) = A;
end
toc
a(idx)=b;
12 件のコメント
Bruno Luong
2023 年 9 月 1 日
編集済み: Bruno Luong
2023 年 9 月 1 日
In my previous code, some extreme floating rouding around realin and reamax can defeat the test.
This modified code looks more robust. It must be something related to denormalization numbers that I'm sure the correct to deal with https://www.mathworks.com/help/fixedpoint/ug/floating-point-numbers.html#f32213
I also do the matrix power differently for the sake of variation
A = rand(16,1);
idx = [1 6 11 16];
B0 = rand(16,1)*0.35;
C = (rand(16,16)-0.5)*1e-1;
D = rand(1,16);
N = 1e7;
A0 = A;
%Method 3
tic;
signal1 = zeros(N,1);
m=numel(A);
s=accumarray(idx(:),1,[m,1]);
Q=B0*s.';
Q=Q+C*(eye(m)-Q);
d = eig(Q,'vector');
d = abs(d);
check0 = all(d < 1);
checkinf = any(d > 1);
QnA = A0;
for n = 1:N
QnA=Q*QnA;
DQnA = D*QnA;
signal1(n) = DQnA;
if check0 && abs(DQnA) < realmin
break
end
if checkinf && abs(DQnA) > realmax
signal1(n+1:end) = sign(DQnA)*Inf;
break
end
end
toc
その他の回答 (0 件)
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!