How can I vectorize this code ?

Input:
  • A: (m x n)
  • B: (k x l) with k,l > m,n respectively
Output:
  • C: (p x q) with p=k-m +1 and q=l-n+1
Each element of C is the sum of the "element by element" product of A and a (m x n) submatrix of B.
  • For C(1,1) the (m x n) submatrix is located in the bottom right corner of B.
  • For C(u<p,v<q) the (m x n) submatrix is shifted by u upwards and v leftwards.
  • For C(p,q) the (m x n) submatrix is located in the top left corner of B.
My code:
C = zeros(p,q);
for u = 1:1:p
for v = 1:1:q
C(u,v) = sum(sum(A .* B( p-u+1:k-u+1 , q-v+1:l-v+1 )));
end
end
It works fine but is way too slow (A and B are very large and contain complex values).
Question:
How can I vectorize this code to increase the speed ?

2 件のコメント

Jan
Jan 2017 年 5 月 10 日
編集済み: Jan 2017 年 5 月 10 日
Please post some typical sizes. Does "very large" mean thousands or billions of elements? It matters if A is [10 x 1e6] or [100 x 10]. What exactly is "too slow"? Are you talking about seconds or weeks?
Daniel
Daniel 2017 年 5 月 10 日
For my application, both A and C are in the 2k x 1k range, and extrapolating from a few loop iterations, it would take about 24 hours to compute.

サインインしてコメントする。

 採用された回答

Andrei Bobrov
Andrei Bobrov 2017 年 5 月 10 日
編集済み: Andrei Bobrov 2017 年 5 月 10 日

1 投票

C = rot90(filter2(A,B,'valid'),2);
or
C = conv2(rot90(B,2),A,'valid');

1 件のコメント

Daniel
Daniel 2017 年 5 月 10 日
Thank you. This is exactly what I was looking for. The gain in speed is huge in my case. For both A and C being 400 x 400, it takes almost 200 sec with the loops, 0.1 sec with filter2 and 4 sec with conv2.

サインインしてコメントする。

その他の回答 (1 件)

Jan
Jan 2017 年 5 月 10 日
編集済み: Jan 2017 年 5 月 10 日

0 投票

For experiments:
function test
m = 100;
n = 100;
k = 1200;
l = 1200;
p = k - m + 1;
q = l - n + 1;
A = rand(m, n);
B = rand(k, l);
tic;
C = zeros(p, q);
for u = 1:p
for v = 1:q
C(u,v) = sum(sum(A .* B(p-u+1:k-u+1, q-v+1:l-v+1)));
end
end
toc
tic;
C = zeros(p, q);
Av = A(:).';
Bv = zeros(numel(Av), 1);
for v = 1:q
for u = 1:p
Bv(:) = B(p-u+1:k-u+1, q-v+1:l-v+1);
C(u, v) = Av * Bv(:);
end
end
toc
The 2nd version uses the summation performed in the DOT product. For the given values it runs in 19 seconds compared to 38 of the original code. But the dimensions are guessed only.
I assume a C-Mex to be more efficient, because it will avoid the explicite creation of B(p-u+1:k-u+1, q-v+1:l-v+1). Do you have a C-compiler installed?
[EDITED] Andrei's filter2 and conv2 approaches need about 5 seconds. I leave the modified loop method also in the forum, because it demonstrates how to increase the speed by a factor 2 with trivial methods.
[EDITED 2] With complex input the timings look different: 48 sec for the modified loop, 21 seconds for the conv2 approach. Interesting! The loop is 2.5 times slower, conv2 4 times.

1 件のコメント

Daniel
Daniel 2017 年 5 月 10 日
Thank you for your time. The filter2 approach seems to be the best one so far.

サインインしてコメントする。

カテゴリ

ヘルプ センター および File ExchangeLoops and Conditional Statements についてさらに検索

タグ

質問済み:

2017 年 5 月 10 日

コメント済み:

2017 年 5 月 10 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by