How to speed up Matrix Vector Multipliacation

18 ビュー (過去 30 日間)
Wei HU
Wei HU 2018 年 3 月 25 日
コメント済み: Jan 2018 年 3 月 26 日
I found that for large matrix with matrix multiplication, Matlab will automatically parallel calculate it. ( CPU usage 100%) However, when doing Matrix multiply vector, it seems only one kernel is in use.
Can this operation be done parallelly? For example, I think we could simply divide the large matrix into many pieces according to row. I tried using parfor in Parallel Toolbox, which result in very low performance.
I also tried writing C function using CBLAS, however, as stated in, https://www.mathworks.com/matlabcentral/answers/390546-wrong-result-when-calling-cblas-dgemv-function-in-a-mex-file There are some problems in this implementation.
I wonder if there are other ways of accelerating this? Thanks!
I tried the following parfor code:
N = % num of rows
M = % num of cols
A = rand(N,M);
b = rand(M,1);
c = zeros(N,1);
p = 6;
batch = floor(N / 6);
parfor i = 1:p
istart = (i-1)*batch+1;
iend = i*batch;
if (i==p)
iend=N;
end
A_sliced = A(istart:iend,:);
c(istart:iend) = A_sliced * b;
end
  3 件のコメント
Wei HU
Wei HU 2018 年 3 月 25 日
Really? I tried like,
N = 20000;
M = 20000;
A = rand(N, M);
b = rand(N,1);
tic
A*b;
toc
It runs very fast,
Elapsed time is 0.109193 seconds.
But still, Matlab only consume about 10% of CPU. I ran on Intel Core i7 6800k, which has totally 6 cores 12 threads, 10% is about 1 core.
John D'Errico
John D'Errico 2018 年 3 月 25 日
編集済み: John D'Errico 2018 年 3 月 25 日
First, don't bother with tic and toc. They are poor ways to time anything.
Next, you need to put it in a loop. If the multiply goes so fast that you cannot see the CPUS all coming alive, then you will never know it used them!
I could not even see the CPUs coming active until I wrapped a loop around it.
And as I said in my answer, hyperthreading does not count. You gain nothing of significance from splitting one CPU into two. If a CPU is fully active, splitting it so that you have two CPUS that are both fully active, but only half as capable is a waste of time. So you have 6 cores.
Hyper-threading is great for some applications. But not here.
Put it in a loop. If one multiply took .1 seconds, then do 100 multiplies. Then carefully watch a CPU monitor, as did I.

サインインしてコメントする。

採用された回答

John D'Errico
John D'Errico 2018 年 3 月 25 日
編集済み: John D'Errico 2018 年 3 月 25 日
MATLAB automatically multi-threads computations where there will be a clear gain. For a matrix-vector multiply, it apparently does not see a gain, unless the problem is large enough. Remember that parallel computations are not always a speedup, because there is extra overhead. If you can farm it to the GPU directly, you might get a gain though. Since I lack that TB, I cannot help you there.
As a test though, I checked to see if MATLAB will multi-thread a matrix*vector operation.
A = rand(15000);B = rand(15000,1);
I stopped here, waiting for a few seconds until MATLAB went quiescent. To ensure that I was not seeing multithreading happen on the call to rand.
for i = 1:100
C = A*B;
end
The multiply was so fast to do, that I had to add a loop to allow me to see my CPU usage go up to the full 400% possible. So indeed MATLAB does multi-thread a matrix*vector multiply, if it sees a gain.
Make sure you have maxNumCompThreads set to the correct value for your CPU. For me:
maxNumCompThreads
ans =
4
Hyper-threading does not count however.
  1 件のコメント
Jan
Jan 2018 年 3 月 26 日
+1. Exactly. A*b is multi-threaded already and you can see it, if you run it for a while by a loop. The task manager is not fast enough to see this for a single call only.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeMatrix Indexing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by