Is it possible to use Arrayfun across rows

Hi,
I currently have a FOR LOOP which works its way through a table with almost 20 million records. It is as expected pretty slow, I want to look into alternatives and I wondered if there is a way to use for arrayfun - or another MATLAB function - across rows which will work with high performance. The example below captures the issue of working across rows:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1)
A.Var3(1) = A.Var1(1)
for i = 2:height(A)
if A.Var1(i) == A.Var1(i-1)
A.Var3(i) = A.Var2(i) .* A.Var2(i-1);
else A.Var3(i) = A.Var2(i);
end
end
Any suggestions will be appreciated.
Kind regards,
William

11 件のコメント

Rik
Rik 2020 年 10 月 6 日
arrayfun (and cellfun and structfun) will simply hide the loop. They will not speed up your code, but they will actually cause a slowdown due to the extra overhead. If you want to speed this up, you need to go multi-threaded with parfor or find vectorized operations. In your example you can use logical indexing to perform the multiplication all at once.
William Ambrose
William Ambrose 2020 年 10 月 6 日
Hi Rik,
thank you for the input. I am not familiar with the parfor function, or but after a quick check I see one needs to the Parallel Computing Toolbox and if I have understood it correctly some setup.
Maybe the vectorized approach makes more sense to pursue. Given that Var1 in effect is the index, but in my real example I have about 10,000 of them I wondered if you could provide some more guidance to how to proceed since I implemented the FOR LOOP as I didn't know of alternative approaches, and still cannot see it.
thanks and kind regards
william
Walter Roberson
Walter Roberson 2020 年 10 月 6 日
Michael Croucher
Michael Croucher 2020 年 10 月 6 日
Is it possible to share your real example somehow please?
William Ambrose
William Ambrose 2020 年 10 月 6 日
Thank you Walter I will take look :-)
To Michael, this is the code:
  • vcode is a table
  • vcode in the vcode table is the index
  • cumExcReturn is the result of multiple operations which is called the Frongello adjustment
for i = 2:height(vcode)
if vcode.vcode(i) == vcode.vcode(i-1)
vcode.cumExcReturn(i) = vcode.cumExcReturn(i-1) .* (2 + vcode.dtrusd_pre(i) + vcode.dtrusd_post(i)) /2 ...
+ vcode.excReturn(i) .* (0 + vcode.indx_pre(i-1) + vcode.indx_post(i-1))./2 ;
else vcode.cumExcReturn(i) = vcode.excReturn(i) ;
end
end
Hope it helps.
Rik
Rik 2020 年 10 月 6 日
For this example it isn't too difficult:
A = table([1;1;1;2;2;2;],[1;2;3;4;5;6]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
B=A;%make a copy to compare
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var2(n) .* A.Var2(n-1);
else
A.Var3(n) = A.Var2(n);
end
end
L = [false;B.Var1(2:end)==B.Var1(1:(end-1))];
ind = find(L);
B.Var3(ind) = B.Var2(ind) .* B.Var2(ind-1);
B.Var3(~L) = B.Var2(~L);
clc,isequal(A,B)
William Ambrose
William Ambrose 2020 年 10 月 6 日
編集済み: William Ambrose 2020 年 10 月 6 日
I set the original example up incorrectly. Please see the corrected version.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A.Var3 = zeros(height(A),1);
A.Var3(1) = A.Var1(1);
for n = 2:height(A)
if A.Var1(n) == A.Var1(n-1)
A.Var3(n) = A.Var3(n-1) .* A.Var2(n);
else
A.Var3(n) = A.Var2(n);
end
end
Rik
Rik 2020 年 10 月 6 日
編集済み: Rik 2020 年 10 月 6 日
Please use the editing tools to format your code as code.
I don't see a way here how you could calculate the branches separately. You might have a performance increase by calculating the runs of true and false in A.Var1 == A.Var1, but the extra overhead might not be worth it.
William Ambrose
William Ambrose 2020 年 10 月 6 日
thanks Rik, I just worked out how to paste the code and format. Pitty on the code part, looping through the data takes such a long time. The challenge is the dependency on the value in the previous row.
Rik
Rik 2020 年 10 月 6 日
The longer the runs are, the more efficient calculating the runs will be. So if you have long stretches of true and/or long stretches of false it might be worth looking into. I think the first branch can also be vectorized (e.g. with cumprod), although I haven't tried yet.
William Ambrose
William Ambrose 2020 年 10 月 6 日
You are right with respect to the true/false, I will definitly use
i = [false; A.Var1(1:end-1) == A.Var1(2:end)]
to get the logical array, I suspect it will make a big impact.

サインインしてコメントする。

回答 (1 件)

Mohammad Sami
Mohammad Sami 2020 年 10 月 6 日

0 投票

Something like this will work.
i = [false; A.Var1(1:end-1) == A.Var1(2:end)];
j = find(i);
A.Var3(i) = A.Var2(j) .* A.Var2(j-1);
A.Var3(~i) = A.Var2(~i);

5 件のコメント

William Ambrose
William Ambrose 2020 年 10 月 6 日
Thank you very much Mohammed, this is very neat. I see though that you solved the original question when I had misstated the problem. I do however really appreciate this, and it was enligtning.
The challenge is that Var3(n) = Var3(n-1) * Var2(n) .
Rik
Rik 2020 年 10 月 6 日
This will only work for the originally posted problem, not the corrected problem.
Mohammad Sami
Mohammad Sami 2020 年 10 月 6 日
編集済み: Mohammad Sami 2020 年 10 月 6 日
In that case you can use this
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
i = [true; A.Var1(1:end-1) ~= A.Var1(2:end)];
id = cumsum(i);
A.Var3 = grouptransform(A.Var2,id,@cumprod);
The above is assuming that Var1 maynot be in sequence e.g. [1 1 1 2 2 2 4 4 4] e.t.c
If it is always in sequence you can shorten it as follows.
A = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
A = grouptransform(A,'Var1',@cumprod,"ReplaceValues",false);
% or explicitly specify which variable to transform if you have other variables
% A = grouptransform(A,'Var1',@cumprod,"Var2","ReplaceValues",false);
William Ambrose
William Ambrose 2020 年 10 月 8 日
Hi Mohammed,
I like the use of grouptransform but unfortunately the use of @cumprod as the function isn't correct. What I need is an iterative cascade through the rows where the result in the previous row - in another column - is input into the calculation. Almost like a FOR LOOP solution for the Fibonnaci sequence.
To be sure I will include the corrected version of the code, with some of your suggestions embedded in, and expected outcome:
B = table([1;1;1;1;1;2;2;2;3],[1;2;3;4;5;6;7;8;500]);
B.Var3 = zeros(height(B),1);
i = [false; B.Var1(1:end-1) == B.Var1(2:end)];
j = find(~i);
B.Var3(j) = B.Var2(j);
tic;
for n = 1:height(B)
if i(n) == 1
B.Var3(n) = B.Var3(n-1) .* B.Var2(n);
end
end ; toc
and the result
B =
9×3 table
Var1 Var2 Var3
____ ____ ____
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
I have appreciate all your suggestions so far so if you have input which could solve the above but without the FOR LOOP I would be very greatful for any input.
Mohammad Sami
Mohammad Sami 2020 年 10 月 8 日
Hi William,
For the updated problem as stated, grouptransform with cumprod will work just as well.
My testing shows the result is identical to the expected result.
A =
9×3 table
Var1 Var2 fun_Var2
____ ____ ________
1 1 1
1 2 2
1 3 6
1 4 24
1 5 120
2 6 6
2 7 42
2 8 336
3 500 500
Ofcourse if the formula changes, for loop may be more generalizable.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangePerformance and Memory についてさらに検索

製品

リリース

R2019b

質問済み:

2020 年 10 月 6 日

コメント済み:

2020 年 10 月 8 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by