Is it possible to use Arrayfun across rows
古いコメントを表示
Hi,
I currently have a FOR LOOP which works its way through a table with almost 20 million records. It is as expected pretty slow, I want to look into alternatives and I wondered if there is a way to use for arrayfun - or another MATLAB function - across rows which will work with high performance. The example below captures the issue of working across rows:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1)
A.Var3(1) = A.Var1(1)
for i = 2:height(A)
if A.Var1(i) == A.Var1(i-1)
A.Var3(i) = A.Var2(i) .* A.Var2(i-1);
else A.Var3(i) = A.Var2(i);
end
end
Any suggestions will be appreciated.
Kind regards,
William
11 件のコメント
Rik
2020 年 10 月 6 日
arrayfun (and cellfun and structfun) will simply hide the loop. They will not speed up your code, but they will actually cause a slowdown due to the extra overhead. If you want to speed this up, you need to go multi-threaded with parfor or find vectorized operations. In your example you can use logical indexing to perform the multiplication all at once.
William Ambrose
2020 年 10 月 6 日
Walter Roberson
2020 年 10 月 6 日
Michael Croucher
2020 年 10 月 6 日
Is it possible to share your real example somehow please?
William Ambrose
2020 年 10 月 6 日
Rik
2020 年 10 月 6 日
For this example it isn't too difficult:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
B=A;%make a copy to compare
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var2(n) .* A.Var2(n-1);
else
A.Var3(n) = A.Var2(n);
end
end
L = [false;B.Var1(2:end)==B.Var1(1:(end-1))];
ind = find(L);
B.Var3(ind) = B.Var2(ind) .* B.Var2(ind-1);
B.Var3(~L) = B.Var2(~L);
clc,isequal(A,B)
William Ambrose
2020 年 10 月 6 日
編集済み: William Ambrose
2020 年 10 月 6 日
Please use the editing tools to format your code as code.
I don't see a way here how you could calculate the branches separately. You might have a performance increase by calculating the runs of true and false in A.Var1 == A.Var1, but the extra overhead might not be worth it.
William Ambrose
2020 年 10 月 6 日
Rik
2020 年 10 月 6 日
The longer the runs are, the more efficient calculating the runs will be. So if you have long stretches of true and/or long stretches of false it might be worth looking into. I think the first branch can also be vectorized (e.g. with cumprod), although I haven't tried yet.
William Ambrose
2020 年 10 月 6 日
回答 (1 件)
Mohammad Sami
2020 年 10 月 6 日
Something like this will work.
i = [false; A.Var1(1:end-1) == A.Var1(2:end)];
j = find(i);
A.Var3(i) = A.Var2(j) .* A.Var2(j-1);
A.Var3(~i) = A.Var2(~i);
5 件のコメント
William Ambrose
2020 年 10 月 6 日
Rik
2020 年 10 月 6 日
Mohammad Sami
2020 年 10 月 6 日
編集済み: Mohammad Sami
2020 年 10 月 6 日
In that case you can use this
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
i = [true; A.Var1(1:end-1) ~= A.Var1(2:end)];
id = cumsum(i);
A.Var3 = grouptransform(A.Var2,id,@cumprod);
The above is assuming that Var1 maynot be in sequence e.g. [1 1 1 2 2 2 4 4 4] e.t.c
If it is always in sequence you can shorten it as follows.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A = grouptransform(A,'Var1',@cumprod,"ReplaceValues",false);
% or explicitly specify which variable to transform if you have other variables
% A = grouptransform(A,'Var1',@cumprod,"Var2","ReplaceValues",false);
William Ambrose
2020 年 10 月 8 日
Mohammad Sami
2020 年 10 月 8 日
Hi William,
For the updated problem as stated, grouptransform with cumprod will work just as well.
My testing shows the result is identical to the expected result.
A =
9×3 table
Var1 Var2 fun_Var2
____ ____ ________
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
Ofcourse if the formula changes, for loop may be more generalizable.
カテゴリ
ヘルプ センター および File Exchange で Performance and Memory についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!