Is there any precomputation step on A'*(.)*A if matrix A is known, to speed up matrix computation A'*B_i*A in matlab with B_i denpends on the previous matrix B_{i-1}.

4 ビュー (過去 30 日間)
zetao
zetao 2025 年 4 月 25 日
編集済み: James Tursa 2025 年 4 月 26 日
if matrix A and B_1 is known. How can I foucus on some precomputations on A'*(.)*A if my laptop's memory is 64GB?
  2 件のコメント
John D'Errico
John D'Errico 2025 年 4 月 25 日
If you think the comment by @Stephen23 was insufficient, then you need to explain why, in a COMMENT. Better yet, edit your question to make it sufficiently descriptive that we might be able to help you. Flags are there for the purpose of identifying spam, etc. I removed your flag on his response.

サインインしてコメントする。

回答 (1 件)

John D'Errico
John D'Errico 2025 年 4 月 25 日
編集済み: John D'Errico 2025 年 4 月 25 日
A is known and fixed, and apparently large, else you would not care about speed.
The amount of memory on your laptop is irrelevant, as long as it is sufficient.
The array B_1 is known. and you tell us that B_n "depends" on B_(n-1). But you don't say how it depends. So all we are told is that B_n is DIFFERENT from B_(n-1). And that means any dependency is useless. It is different. Different is just DIFFERENT.
I suppose if we were told that B_n and B_(n-1) are different in only a few elements, then we could write
B_n = B_(n-1) + B_delta
where B_delta is VERY sparse. Then you could gain, because (especially if A itself is also sparse) you could write
A'*B_n*A = A'*B_(n-1)*A + A'*B_delta*A
But we are told nothing about the dependency. The crystal ball is so cloudy too, so I cannot read your mind or see inside your computer.
So really, your question reduces to wanting to speed up the computation of A'*B*A. Are the matrices A and B sparse? Are they seriously sparse? Just half zeros is not at all sparse, and you would see no gain. But are they sparse in any usable sense? I don't recall the break even point, but I spent some time once to find where that might be. Again, you need seriosuly sparse arrays before you will see much gain, because there may likely be a great deal of fill-in.
You could try writing your own compiled code to do the multiply, but even if you use tools like LAPACK, you still won't beat MATLAB's matrix multiplies, as they already use those same highly optimized tools.
You COULD create A' in advance, but as I recall, the transpose of a large matrix is really fast anyway, and I think I recall the parser is smart enough to use optimized code for this general form anyway.
A = rand(5000);B = rand(5000);
Ap = A';
timeit(@() A'*B*A)
ans =
0.6260
timeit(@() Ap*B*A)
ans =
0.6262
If you don't care so much about the precision, you could work with singles instead of the default double arithmetic. As I recall, single precision is generally faster than double.
As = single(A); Bs = single(B);
timeit(@() As'*Bs*As)
ans =
0.1669
Is there something else you could do? Buy a faster computer! Or rent one.
  5 件のコメント
James Tursa
James Tursa 2025 年 4 月 26 日
編集済み: James Tursa 2025 年 4 月 26 日
@Paul That is a good question. I suspect the code analyzer is making that particular suggestion so that the (A'*A) can call the BLAS symmetric matrix multiply routine in the background, which runs in about 1/2 the time of the generic matrix multiply routine and guarantees an exact symmetric/Hermitian result. But it misses the overall picture as you have surmised. Matrix sizes can, of course, alter choices for how to go about this. But my comments in the general case are as follows (SMM = Symmetric Matrix Multiply, GMM = Generic Matrix Multiply):
A = rand(6000); B = rand(6000);
B'*A'*A*B; % 1
3 GMM (slowest)
No transposes explicitly formed
Result not guaranteed to be exactly symmetric/Hermitian (WORST)
B'*(A'*A)*B; % 2
1 SMM + 2 GMM (Faster than 1)
No transposes explicitly formed
Result not guaranteed to be exactly symmetric/Hermitian (WORST)
(A*B)'*(A*B); % 3
1 SMM + 2 GMM (Faster than 1) ... or 1 SMM + 1 GMM?, depends on parser smarts
No transposes explicitly formed
Result guaranteed to be exactly symmetric/Hermitian (BEST)
AB = A*B; AB'*AB; % 4
1 SMM + 1 GMM (FASTEST)
No transposes explicitly formed
Result guaranteed to be exactly symmetric/Hermitian (BEST)
-------------------------------------
My preference is method 4 above. It is guaranteed to be the fastest and also guaranteed to give you an exact symmetric/Hermitian result. Method 3 might give you the same background calculations as method 4, but that would depend on the parser and version of MATLAB involved, and I have seen no evidence of these parser smarts to date. E.g.,
A = rand(6000); B = rand(6000);
tic; C = B'*A'*A*B; toc
Elapsed time is 2.085373 seconds.
isequal(C,C')
ans = logical
0
clear C
tic; C = B'*(A'*A)*B; toc
Elapsed time is 1.847934 seconds.
isequal(C,C')
ans = logical
0
clear C
tic; C = (A*B)'*(A*B); toc
Elapsed time is 1.861296 seconds.
isequal(C,C')
ans = logical
1
clear C
tic; AB = A*B; C = AB'*AB; toc
Elapsed time is 1.162486 seconds.
isequal(C,C')
ans = logical
1
clear C
---------------------------
Method 4 is the clear winner here, and is what I would always do personally. Maybe this could be an enhancement request to make the parser smarter in this particular case.
Paul
Paul 2025 年 4 月 26 日
Thanks for the insight.
"Matrix sizes can, of course, alter choices for how to go about this"
Here is an extreme(?) case that illustrates wrt to execution time with case 1 being much faster than case 2. Note that C2 results in a large square matrix as in intermediate variable and memory constraints can be an issue if numel of A and B get larger.
A = rand(2,6000); B = rand(6000,2);
tic; C1 = B'*A'*A*B; toc
Elapsed time is 0.000261 seconds.
tic; C2 = B'*(A'*A)*B; toc
Elapsed time is 0.125257 seconds.
tic; C3 = (A*B)'*(A*B); toc
Elapsed time is 0.000175 seconds.
tic; AB = A*B; C4 = AB'*AB; toc
Elapsed time is 0.000173 seconds.
isequal(C1,C2,C3,C4)
ans = logical
0

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeMatrix Indexing についてさらに検索

製品


リリース

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by