Inlined code segment slower than internal function pass - why?
1 回表示 (過去 30 日間)
古いコメントを表示
I'm trying to speed up prototype code and have found a strange instance of speed increase when replacing standard inlined code i'm using inside of a loop. The inlined code is as follows:
s=0;
for dd=1:numel(loc)
s=s+(dynpts(:,dd)-loc(dd)).^2;
end
fidx=s<sel.rad^2;
ix=find(fidx);
Somehow, this is 2-3x slower in profiling than making it an in-script subfunction call: ix = rangesearchnest(loc,sel.rad,dynpts); with an identical body (different variable names). I don't know how this could be the case for any circumstance - my understanding that JIT and internal optimizations should work on the inlined code better than external calls. However, dynpts is a nx3 array where n is in the millions to billions so I was expecting a tremendous speed increase with the inlined version merely as a result of not needing to pass the gargantuan array as an argument (and potential memory limit issues).
Is there special case behavior i'm not aware of happening here?
1 件のコメント
Walter Roberson
2023 年 2 月 11 日
x = find(s<sel.rad^2);
is potentially better optimized then the two-statement version.
In numeric cases where the < ordering is guaranteed not to return errors, then potentially MATLAB could run s(K)<sel.rad^2 in a loop gathering indices as it went (perhaps into a linked list) instead of first calculating s and sel.rad^2 as logical vectors and then doing a find() operation on the result
In order to determine whether it does that kind of operation, you would probably need to use large matrices, right on the boundary, where calculating s<sel.rad^2 first would exhaust your memory.
The language model is to calculate the logical vector first, but in most languages, internal optimizations are permitted to vary order of operations provided that the result is the same when no exceptions occur.
回答 (1 件)
Matt J
2023 年 2 月 10 日
編集済み: Matt J
2023 年 2 月 10 日
with the inlined version merely as a result of not needing to pass the gargantuan array as an argument (and potential memory limit issues).
Passing a variable to a function does not result in any memory copying unless the function makes changes to the variable, which you are not doing. Also, my recollection of how the JIT works is that it optimizes the execution of functions, but not scripts. So, if your top level code is not enclosed ina function, that might be part of it as well.
3 件のコメント
Matt J
2023 年 2 月 12 日
編集済み: Matt J
2023 年 2 月 12 日
I don't know what you mean by the "external loop", but the tests below seem consistent with the rest of your comment. None if it is too surprising, IMHO. The vectorized version allocates the most memory, so it makes sense to me that the loop is fastest when full optimizations are applied.
n=1e7;
[dynpts,loc]=deal(rand(n,3),rand(1,3));
timeit(@()implem1(dynpts,loc))
timeit(@()implem2(dynpts,loc))
tic;
s=0;
for dd=1:numel(loc)
s=s+(dynpts(:,dd)-loc(dd)).^2;
end
toc
tic
s = sum((dynpts-loc).^2,2);
toc
function implem1(dynpts,loc)
s=0;
for dd=1:numel(loc)
s=s+(dynpts(:,dd)-loc(dd)).^2;
end
end
function implem2(dynpts,loc)
s = sum((dynpts-loc).^2,2);
end
参考
カテゴリ
Help Center および File Exchange で Function Creation についてさらに検索
製品
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!