Is there a way to speed up matrix calculations by Parallel Computing Toolbox?

1 回表示 (過去 30 日間)
Kyle
Kyle 2013 年 10 月 2 日
コメント済み: Kyle 2013 年 10 月 6 日
For example, I have the following matrix calculation:
for j = 1: 1000000
O(i,i,j) = M(i,i).* N(i,i).* J(i,i);
end
where i is, say, 200. To speed up the calculation, can I take the advantage of Parallel Computing Toolbox? I have three macbooks, can I connect them together to execute the above codes together in an parallel way? Many thanks
  2 件のコメント
Kyle
Kyle 2013 年 10 月 3 日
Thank you for your suggestions, Matt J and Walter Roberson. I think I need to detail my question. I was actually going to convert my codes to a parallel version running on multiple computers so that the the efficiency could be dramatically enhanced.
My codes are about solving a pair of differential equations (the Brusselator model), in other words, 2D gird-simulation in high resolution.
% Simulate 2D Brusselator model by Euler algorithm
clear;clc;close all
a = 5;
b = 19;
Dx = 5;
Dy = 40;
time_end = 150; % length of simulation, in sec
%%dimensions for the full-resolution grid
[Nx Ny] = deal(60); % no. of sampling points along each axis
[Lx Ly] = deal(60); % square cortex (cm)
[dx dy] = deal(Lx/Nx, Ly/Ny); % spatial resolution (cm)
%%time resolution and time-base
dt = 1*1e-3; % Nx = Ny = 60; D2 = 1.0 cm^2
Nsteps = round(time_end/dt)
%%3x3 Laplacian matrix (used in grid convolution calculations)
Laplacian = [0 1 0; 1 -4 1; 0 1 0];
%%set up storage vectors and grids
[U_grid V_grid] = deal(0.001*randn(Nx, Ny));
% save UV_inis U_grid V_grid
U0 = a; V0 = b/a;
% initialize the grids at steady-state values
[U_grid V_grid] = deal(U0 + U_grid, V0 + V_grid);
% diffusion multipliers (depend on step size)
Dx = Dx/dx^2;
Dy = Dy/dx^2;
%%Simulation
stride2 = 100; % iterations per screen update
time = [0:Nsteps-1]'*dt; % timebase
ii = 1;
for i = 1: Nsteps
i
U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + ...
Dx*convolve2(U_grid, Laplacian, 'wrap'));
V_grid = V_grid + dt*(b*U_grid - U_grid.^2.*V_grid + ...
Dy*convolve2(V_grid, Laplacian, 'wrap'));
if (mod(i, stride2) == 1 || i == Nsteps)
mesh(x, y, U_grid);
drawnow;
U_save(:,:, ii) = U_grid;
ii = ii + 1;
end
end
In the codes, "U_save" is the final results I want to have. If the "Nx" and "Ny" are set to 300, meaning high resolution, the computing speed will be very slow.
I have Parallel Computing Toolbox and Distributed Computing Toolbox. May I have your suggestions for an efficient calculation? Many thanks
Kyle
Kyle 2013 年 10 月 4 日
As all experts suggested, parallel computation may not be much helpful for my case. In the other hand, I'm thinking GPU-assisted calculation. Since my computer doesn't have a sound GPU, I need your suggestions that if it worths to get a modern graphic card to boost my simulation? Can GPU help to enhance my grid simulation? Many thanks

サインインしてコメントする。

採用された回答

Teja Muppirala
Teja Muppirala 2013 年 10 月 4 日
As others have pointed out, this will be difficult to parallelize for a single run, since you are integrating forward in time, and each successive time step depends on the previous one.
Parallelization will help if you need to run this for many different initial conditions though.
Putting parallelization aside for the moment, you will find that if you write the convolution out explicitly using matrix manipulations, your simulation will run much faster. In other words, replace these two lines:
U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + ...
Dx*convolve2(U_grid, Laplacian, 'wrap'));
V_grid = V_grid + dt*(b*U_grid - U_grid.^2.*V_grid + ...
Dy*convolve2(V_grid, Laplacian, 'wrap'));
by this instead:
UL = -4*U_grid+U_grid([2:60 1],:)+U_grid([60 1:59],:)+U_grid(:,[2:60 1])+U_grid(:,[60 1:59]);
U_grid = U_grid + dt*(a-(b+1)*U_grid + U_grid.^2.*V_grid + Dx*UL);
VL = -4*V_grid+V_grid([2:60 1],:)+V_grid([60 1:59],:)+V_grid(:,[2:60 1])+V_grid(:,[60 1:59]);
V_grid = V_grid + dt*(b*U_grid - U_grid.^2.*V_grid + Dy*VL);
and you will see quite a speedup.
  1 件のコメント
Walter Roberson
Walter Roberson 2013 年 10 月 4 日
I had written that into my posting and edited it out again when I went the route of iterating symbolically. Part of the reason I dropped it was that I didn't want to try to work out what the edge effects were, but it looks like they were exactly what my first thought was ;-)

サインインしてコメントする。

その他の回答 (2 件)

Matt J
Matt J 2013 年 10 月 3 日
編集済み: Matt J 2013 年 10 月 3 日
It is wasteful to repeatedly compute the same fixed M(i,i).* N(i,i).* J(i,i) for every j. Also, matrix operations are already coded to take advantage of your system's multi-threading capabilities. I would just skip the for-loop and do
O(i,i,1:1000000)=M(i,i).* N(i,i).* J(i,i);
  1 件のコメント
Kyle
Kyle 2013 年 10 月 3 日
Thank you, Matt J. Matrices M, N and J are dynamical actually. I have updated my question for your reference. Thank you

サインインしてコメントする。


Walter Roberson
Walter Roberson 2013 年 10 月 3 日
for j = 1: 1000000
O(i,i,j) = M(i,i).* N(i,i).* J(i,i);
end
can be rewritten as
MNO = M(i,i) .* N(i,i) .* J(i,i);
for j = 1 : 1000000
O(i,i,j) = MNO;
end
which in turn can be rewritten as
joff = sub2ind( size(O), [i, i, 1] );
jskip = size(O,2) .* size(O,3);
O(joff : jskip : end) = M(i,i) .* N(i,i) .* J(i,i);
It probably is not worthwhile to convert this for use with the parallel toolbox, but if you did then you would calculate M(i,i) .* N(i,i) .* J(i,i) ahead of time, and then use a parfor or distributed array to set every jskip'th element of the O array to be the precalculated value.
In order to hook multiple computers up for use in the parallel computing toolbox, you would also need to use the distributed computing toolbox, as by itself the parallel computing can only be local.
If the above is not your pattern, then we will need to see your real pattern to advise you as to whether the parallel computing will help.
  5 件のコメント
Walter Roberson
Walter Roberson 2013 年 10 月 3 日
That is applicable if the order of the partial factors does not make any difference -- if the reduction operation is commutative. It is not immediately clear that it would be the case for the above code.
A 1D convolution can be written as a matrix multiplication involving a topelitz matrix, and I have read that a 2D convolution can be done as two 1D convolutions. That would seem to suggest that a 2D convolution could be done as a single matrix multiplication by the product of the two topelitz matrices, but I have not yet found anything that says that explicitly, and I have not found anything that talks about the numeric stability if that were the case.
If X and Y represent the initial U_grid and V_grid respectively, it appears that each iteration goes something like
X = X + dt * (a - (b+1) * X + X^2 * Y + X*Conv)
Y = Y - (X^2 * Y + (Conv - b - 1) * X + a)^2 * Y * dt^3 - (2 * (X^2 * Y + (Conv - b - 1) * X + a)) * (-(1/2) * b + X * Y) * dt^2 + (X * b - X^2 * Y + Dy * Conv) * dt
where Conv is the hypothetical multiplication of the two topelitz matrices as needed to represent the 2D convolution.
It does not at all appear to me that the reduction over addition would involve terms that are independent of the order of calculations, so I doubt the particular code here can be done as a parallel reduction.
Kyle
Kyle 2013 年 10 月 6 日
Thank you Walter for your suggestions. I'm testing your algorithms.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeLogical についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by