Computing the outer product (BLAS 2 operation) raised to the power k using the least number of flops

4 ビュー (過去 30 日間)
Explain how (xy^T)^k can be computed using the least number of flops, where x, y in R^n are vectors. Then write the corresponding MATLAB algorithm
Here is my algorithm:
function M = outer(x, y, k)
n = length(x);
n = length(y);
A = zeros(n, n);
for j = 1 : n
for i = 1 : n
A(i, j) = x(i) * y(j);
M(i,j) = A(i,j)^k;
end
end
so for example, if I have x = [1 2], and y = [2 4], and k = 2, then (xy^T) = [2 4; 4 8], and (xy^T)^2 = [4 16; 16 64], while (xy^T)^2 should be equal to [20 40; 40 80]. what is wrong with my algorithm? (And I looped over the columns first and put M(i,j) outside the for loop to use the least number of flops as the question wants. Is it correcr?)
  3 件のコメント
Pascale Bou Chahine
Pascale Bou Chahine 2020 年 9 月 19 日
Yeah, I should end up with a matrix, i.e. if k = 2, I should have (xyT)^k = (xy^T)^2 = [20 40; 40 80], but with my algorithm, I'm getting element-wise power, i.e. I'm getting (xy^T)^2 = [4 16; 16 64].
Bjorn Gustavsson
Bjorn Gustavsson 2020 年 9 月 19 日
If you want the [20 40;40 80] result, then you should take the matrix-power, and not the elementwise power. The matrix power are (for the case of an exponent of 2 and 3):
M2 = M*M; % k = 2
M3 = M*M*M; % k = 3
You have to modify your algorithm and read up on efficient calculations of matrix-powers...

サインインしてコメントする。

回答 (2 件)

Gaurav Garg
Gaurav Garg 2020 年 9 月 22 日
Hi Pascale,
MATLAB supports LAPACK and BLAS functions/operations which can help you to reduce the number of FLOPS for your function(s). To know more about LAPACK and BLAS, kindly refer to doc here and to know about how can you use MATLAB and BLAS operations in your script/function, kindly refer here.

Walter Roberson
Walter Roberson 2020 年 9 月 22 日
LAPACK and BLAS are designed for computational efficiency, which is not the same thing as reducing the number of FLOPS. The least number of FLOPS is not necessarily the most efficient computation. The theoretical computations for matrix multiplications can involve a lot of cache misses for the right hand matrix; it is a lot more efficient in processor time to do block calculations, which are still "pretty" efficient but might have more FLOPS, -- for example it might require (say) 2*n+m additional additions compared to the theoretical, and yet a block multiplication might be 100 times faster due to not having as many cache misses.
matrix to an integer power can have the integer power decomposed into binary. For example, if k = 9, then instead of 9 matrix multiplications, M*M*M*M*M*M*M*M*M you can take M2 = M*M; M4 = M2*M2; M8 = M4*M2; M9 = M8*M -> only 4 matrix multiplies. Or M2 = M*M, M3 = M2*M, M9 = M3*M3 -> only 3 matrix multiplies
However, for large enough k even this is not necessarily the most efficient. Instead you can do [U,S,V] = svd(M); U * S^k * V and S is guaranteed to be diagonal, so S^k is S.^k .
But to know whether that is going to be more efficient than prime decomposition, you have to know the cost of SVD.
I have a suspicion that the problem might have been structured in a way that makes it easier to do SVD. Experimentally, in the tests I did, S was always non-zero only in its (1,1) entry.... though the values of the matrices are not at all obvious to me at the moment, so I do not know at the moment if there would be an "inexpensive" SVD that could be done.
  1 件のコメント
Bjorn Gustavsson
Bjorn Gustavsson 2020 年 9 月 22 日
Well, since y is simply 2*x I think we can cook up the eigenvectors in 2-D space rather easily (x/norm(x) and one perpendicular to that) and the nonzero eigenvalue I get to be the product norm(x)*norm(y). But since the problem is clearly designed for someone to work out, it surely is a home-work task?

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeMatrix Indexing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by