Why is this loop faster than a vectorised version? Could the vectorised version be made faster than the loop?
2 ビュー (過去 30 日間)
古いコメントを表示
I'm trying to improve performance in a code that uses a loop. I've written a vectorised version matching the functionality, while avoiding costly transposes. However, I've found that the loop version invariably runs ~25% more quickly. Is there any way to further improve the performance of the vectorised version so that it surpasses the loop?
Of course, this is a tiny sub-function of a much larger, more complex program, but it is called tens of thousands of times in a single run, and is a bottleneck in the run time.
I do have the parallel computing toolbox, so could look into using parfor loops, but these don't always save time, and I was surprised that the vectorised version doesn't perform better!
% Input vectors
x1 = rand(1, 960);
x2 = rand(1, 960);
%% Looped version
tic;
Y1 = 1.75:0.25:39;
Y2 = (10 .^ (Y1 / 21.366) - 1 ) / 0.004368;
EoutLoop = zeros(length(x2), length(Y1));
for i=1:length(Y1)
p1 = 4.*Y2(i)./24.673.*(0.004368.*Y2(i) + 1).*ones(1, length(x2));
p2 = 4.*Y2(i)./24.673.*(0.004368.*Y2(i) + 1) - 0.35.*(4.*Y2(i)./24.673.*(0.004368.*Y2(i) + 1)./4.*1000./24.673.*(0.004368.*1000 + 1)).*(x2 - 51);
p3 = p1.*(x1 >= Y2(i)) + p2 .* (x1 < Y2(i));
g1 = (x1 - Y2(i))./Y2(i);
g1 = abs(g1);
EiLoop = (1 + p3.*min(g1,4)).*exp(-1.*p3.*min(g1,4)).* 10.^(x2./10);
EoutLoop((1:length(x2)), i) = EiLoop(1:length(x2));
end
if (size(EoutLoop,1) > 1)
EoutLoop = sum(EoutLoop);
end
EoutLoop = 10 .* log(EoutLoop) ./ log(10);
% end timer
toc;
%% Vectorised version
% transpose input vector for vectorised version
x2 = x2.';
x1 = x1.';
% start timer
tic;
Y1 = 1.75:0.25:39;
Y2 = (10 .^ (Y1 / 21.366) - 1 ) / 0.004368;
EoutVec = zeros(length(x2), length(Y1));
p1 = 4.*Y2./24.673.*(0.004368.*Y2 + 1).*ones(length(x2), length(Y2));
p2 = 4.*Y2./24.673.*(0.004368.*Y2 + 1) - 0.35.*(4.*Y2./24.673.*(0.004368.*Y2 + 1)./4.*1000./24.673.*(0.004368.*1000 + 1)).*repmat((x2 - 51), 1, length(Y2));
p3 = p1.*(x1 >= repmat(Y2, 1, size(x1, 2))) + p2 .* (x1 < repmat(Y2, 1, size(x1, 2)));
g1 = ((x1 - repmat(Y2, 1, size(x1, 2)))./repmat(Y2, 1, size(x1, 2)));
g1 = abs(g1);
EVec = (1 + p3.*min(g1,4)).*exp(-1.*p3.*min(g1,4)).*repmat(10.^(x2./10), 1, length(Y2));
EoutVec((1:length(x2)), :) = EVec(1:length(x2), :);
if (size(EoutVec,1) > 1)
EoutVec = sum(EoutVec);
end
EoutVec = 10.*log(EoutVec)./log(10);
% end timer
toc;
3 件のコメント
Alexander
2023 年 11 月 22 日
I agree. On my old Win7 machine (R2021b) the result is
Loop: Elapsed time is 0.316945 seconds.
Vectorised: Elapsed time is 0.062135 seconds.
採用された回答
Dyuman Joshi
2023 年 11 月 22 日
編集済み: Dyuman Joshi
2023 年 11 月 22 日
Ideally, timeit should be used over tic-toc to get a more accurate idea of run times of the codes. tic-toc is generally used for portions of code.
"Use the timeit function for a rigorous measurement of function execution time. Use tic and toc to estimate time for smaller portions of code that are not complete functions." Reference - Measure the Performance of Your Code
While using tic-toc to measure the time of the code, you can either
> Run the same code multiple times via a for loop and average the data - "Sometimes programs run too fast for tic and toc to provide useful data. If your code is faster than 1/10 second, consider measuring it running in a loop, and then average to find the time for a single run." (Reference - https://in.mathworks.com/help/matlab/ref/tic.html#bswc7ww-3)
or
> Take a large(r) dataset.
I have chosen the latter option below -
% Input vectors
%% Large(r) dataset
x1 = rand(1, 100000);
x2 = rand(1, 100000);
%% Looped version
Y1 = 1.75:0.25:39;
Y2 = (10 .^ (Y1 / 21.366) - 1 ) / 0.004368;
EoutLoop = zeros(length(x2), length(Y1));
tic;
for i=1:length(Y1)
p1 = 4.*Y2(i)./24.673.*(0.004368.*Y2(i) + 1).*ones(1, length(x2));
p2 = 4.*Y2(i)./24.673.*(0.004368.*Y2(i) + 1) - 0.35.*(4.*Y2(i)./24.673.*(0.004368.*Y2(i) + 1)./4.*1000./24.673.*(0.004368.*1000 + 1)).*(x2 - 51);
p3 = p1.*(x1 >= Y2(i)) + p2 .* (x1 < Y2(i));
g1 = (x1 - Y2(i))./Y2(i);
g1 = abs(g1);
EiLoop = (1 + p3.*min(g1,4)).*exp(-1.*p3.*min(g1,4)).* 10.^(x2./10);
EoutLoop((1:length(x2)), i) = EiLoop(1:length(x2));
end
if (size(EoutLoop,1) > 1)
EoutLoop = sum(EoutLoop);
end
EoutLoop = 10 .* log(EoutLoop) ./ log(10);
% end timer
toc;
%% Vectorised version
% transpose input vector for vectorised version
x2 = x2.';
x1 = x1.';
% start timer
Y1 = 1.75:0.25:39;
Y2 = (10 .^ (Y1 / 21.366) - 1 ) / 0.004368;
EoutVec = zeros(length(x2), length(Y1));
tic;
p1 = 4.*Y2./24.673.*(0.004368.*Y2 + 1).*ones(length(x2), length(Y2));
p2 = 4.*Y2./24.673.*(0.004368.*Y2 + 1) - 0.35.*(4.*Y2./24.673.*(0.004368.*Y2 + 1)./4.*1000./24.673.*(0.004368.*1000 + 1)).*repmat((x2 - 51), 1, length(Y2));
p3 = p1.*(x1 >= repmat(Y2, 1, size(x1, 2))) + p2 .* (x1 < repmat(Y2, 1, size(x1, 2)));
g1 = ((x1 - repmat(Y2, 1, size(x1, 2)))./repmat(Y2, 1, size(x1, 2)));
g1 = abs(g1);
EVec = (1 + p3.*min(g1,4)).*exp(-1.*p3.*min(g1,4)).*repmat(10.^(x2./10), 1, length(Y2));
EoutVec((1:length(x2)), :) = EVec(1:length(x2), :);
if (size(EoutVec,1) > 1)
EoutVec = sum(EoutVec);
end
EoutVec = 10.*log(EoutVec)./log(10);
% end timer
toc;
You can see that the time taken by the vectorized approach is less than half of the time taken by the for loop approach.
4 件のコメント
Dyuman Joshi
2023 年 11 月 22 日
You are welcome!
It's good to know that you are utilizing the Profiler, it is an extremely helpful tool!
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Fourier Analysis and Filtering についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!