How can the following code be optimized / vectorized?

How can you optimize the following? It is painfully slow and takes around 5 minutes to run on my computer. Here I am calcuatling the distances between atoms in molecules while taking account periodic boundary conditions (which mean an atom on the edge of the box might actually belong to a molecule that wraps around, so to speak). I imagine this could be done in a far more efficient and speedy way than my Fortran-esque implementation.
A % an array where column 1 is the atom ID, column 2 is the molecule ID, and columns 3-5 are the positions in x, y, and z
N = 200; % molecules in system
Sites = 150; % sites in each molecule
length = 120; % length of box on each side
SitesInSystem = N*Sites;
B = zeros(SitesInSystem,SitesInSystem);
% center box at origin
A(:,3) = A(:,3) - length/2;
A(:,4) = A(:,4) - length/2;
A(:,5) = A(:,5) - length/2;
for ii = 1:SitesInSystem
for jj = 1:SitesInSystem
if ii ~= jj % if not the same site
dx = A(jj,3) - A(ii,3);
dy = A(jj,4) - A(ii,4);
dz = A(jj,5) - A(ii,5);
% taking into account periodic boundaries
if (dx > length * 0.5)
dx = dx - length;
elseif (dx <= -length * 0.5)
dx = dx + length;
end
if (dy > length * 0.5)
dy = dy - length;
elseif (dy <= -length * 0.5)
dy = dy + length;
end
if (dz > length * 0.5)
dz = dz - length;
elseif (dz <= -length * 0.5)
dz = dz + length;
end
% calculate distances
B(ii,jj) = sqrt(dx^2+dy^2+dz^2);
end
end
end
B = triu(B); % to avoid double counting
Perhaps this also can be combined with the following snippet, which calculates something based on the values of the matrix B:
for ii = 1:N % molecule
for jj = 1:Sites % site
for kk = 1:N
for mm = 1:Sites
if ii ~= kk % if not the same molecule
if (B(ii*jj,kk*mm) < 1.75) && (B(ii*jj,kk*mm) > 0)
% do a calculation, i.e., add edge ii-kk to a graph
% with N nodes
end
end
end
end
end
end

4 件のコメント

KSSV
KSSV 2022 年 3 月 16 日
It looks like this can be achieved with ease. Why don't you attach the data for A. What is its size? I feel it is a column major array. I am surprised with the loop indexing looking at A.
L'O.G.
L'O.G. 2022 年 3 月 16 日
編集済み: L'O.G. 2022 年 3 月 16 日
A is quite large: 30000 (rows) x 5 (columns), since N (molecules) is 200 here and Sites (atoms) are 150. As mentioned in my attempt, column 1 is the atom ID (integer from 0 to 29999), column 2 is the molecule ID (integer from 0 to 199), and columns 3-5 are the positions in x, y, and z, respectively (integer for now, but could be a floating point number). The attached is a very pared down example (again, not the whole data, which is quite large). The example just has two molecules consisting of 137 atoms each (rather than 150 atoms per molecule as in my code above). Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array.
Stephen23
Stephen23 2022 年 3 月 16 日
"Here you can read the data into MATLAB and use readtable and table2array to turn the data into a MATLAB array."
READMATRIX would be simpler and more efficient: https://www.mathworks.com/help/matlab/ref/readmatrix.html
L'O.G.
L'O.G. 2022 年 3 月 16 日
@Stephen Thanks. I didn't know about that function!

サインインしてコメントする。

 採用された回答

Bruno Luong
Bruno Luong 2022 年 3 月 16 日
編集済み: Bruno Luong 2022 年 3 月 16 日

0 投票

It takes about 25 seconds only PC
Ac35 = A(:,3:5);
Ai = reshape(Ac35,[],1,3);
B = zeros(SitesInSystem,SitesInSystem);
for j=1:SitesInSystem
Aj = reshape(Ac35(j,:),1,[],3);
dxyz = Ai-Aj;
dxyz = mod(dxyz+length/2,length)-length/2;
B(:,j) = sqrt(sum(dxyz.^2,3));
end

その他の回答 (2 件)

KSSV
KSSV 2022 年 3 月 16 日

0 投票

I think pdist2, this inbuilt function will work for you.
T = readtable('Sample.xlsx') ;
A = table2array(T) ;
A = A(:,3:5) ; % pick coordinates (x,y,z)
d = pdist2(A,A) ;
B = triu(d) ; % you may check this with you answer. Use isequal to check.

3 件のコメント

L'O.G.
L'O.G. 2022 年 3 月 16 日
編集済み: L'O.G. 2022 年 3 月 16 日
@KSSV Thanks, I really appreciate it. Two things: first, due to the calculation of the differences dx, dy, and dz that take into account periodic boundary calculations, is there an easy way of incorporating that without all of the for loops? In your proposed solution, that's not taken into account. Second, is the second set of for loops necessary to do the calculations I mention using B?
KSSV
KSSV 2022 年 3 月 16 日
I have checked your code and pdist2 for the provided 274*5 data. Both the methods, gave same solution.
L'O.G.
L'O.G. 2022 年 3 月 16 日
編集済み: L'O.G. 2022 年 3 月 16 日
@KSSV Right, but for the full case (not the sample file I gave) I would need to consider that some atoms might be at the edges of the box, so that periodic boundaries as applied to that molecule must be taken into account. Sorry I didn't make that clearer. I mentioned it in my initial query and coded it using the if statements nested in the orignal set of for loops, but I guess not very clearly.

サインインしてコメントする。

Walter Roberson
Walter Roberson 2022 年 3 月 16 日

0 投票

You can see that the rem() based expression I use here can be used to vectorize the calculation.
format bank
Length = 5;
x = -3.5 : .5 : 3.5
x = 1×15
-3.50 -3.00 -2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
for K = 1 : length(x)
dx = x(K);
if (dx > Length * 0.5)
dx = dx - Length;
elseif (dx <= -Length * 0.5)
dx = dx + Length;
end
newx_if(K) = dx;
end
newx_mod = rem((x - Length/2), Length) + Length/2;
newx_if
newx_if = 1×15
1.50 2.00 2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 -2.00 -1.50
newx_mod
newx_mod = 1×15
1.50 2.00 2.50 -2.00 -1.50 -1.00 -0.50 0 0.50 1.00 1.50 2.00 2.50 3.00 3.50
newx_if - newx_mod
ans = 1×15
0 0 0 0 0 0 0 0 0 0 0 0 0 -5.00 -5.00

3 件のコメント

L'O.G.
L'O.G. 2022 年 3 月 16 日
編集済み: L'O.G. 2022 年 3 月 16 日
@Walter Roberson Thanks very much. I am trying to understand what you did. I guess I should actually take the differences dx (and dy and dz) before using rem() to take into account the boundaries. How would I do that efficiently? Generating all distances (for x, for example) by something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow.
Walter Roberson
Walter Roberson 2022 年 3 月 16 日
newxyz_mod = rem((A - Length/2), Length) + Length/2; % for x, y, z
looks fine, as long as Length is the same in all three directions.
You would follow it with
B = sqrt( sum(newxyz_mod.^2, 2) );
L'O.G.
L'O.G. 2022 年 3 月 16 日
編集済み: L'O.G. 2022 年 3 月 16 日
@Walter Roberson Got it. Thank you! And about my other question, the differences -- rem() should use the differences, correct? Not the positions themselves? That's my understanding, but I don't know how to calculate that efficiently. As I mentioned in my previous comment, something like pdist(A(:,1),@(x,y) x-y); generates a huge vector and is slow, and that's just for x! I think there will be (N*Sites)^2 / 2 possible distances which will fill an upper triangular matrix. I'm not sure how to code that efficiently here, though.

サインインしてコメントする。

カテゴリ

製品

リリース

R2021b

質問済み:

2022 年 3 月 16 日

コメント済み:

2022 年 3 月 16 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by