My mex file is slower than my original matlab equivalent

4 ビュー (過去 30 日間)
Mohammad Shojaei Arani
Mohammad Shojaei Arani 2022 年 7 月 18 日
Hello friends,
I need to calculate some quantities of linear algeibra type, so they are merely matrix and vector products. The following is an example
EZ=[(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
where Ds0,Ds1,Ds2,Dm0,Dm1,Dm2 are 1*n vectors. When I do the calculations using matlabFunction (attached) it is fast. However, I am not satisfied since I really need to do such calculations thousands of time s(if not millions of times). To overcome this issue I decided to give mex a try. Unfortunately, the equivalent mex file (which I made by matlab coder) is slower 2-3 times (I could not upload it here, unforetunately).
Is there any hope to create a mex file out of this function which is much faster? I hope so!
Thanks for your help in advance,
Babak
  7 件のコメント
Bruno Luong
Bruno Luong 2022 年 7 月 18 日
編集済み: Bruno Luong 2022 年 7 月 18 日
So you don't think simplying expression matters? I do think the contrary. Everytime in the expression there is a gpu array involves there is a whole transfer data from cpu to gpu, you mighte have 200 such terms in your expression, I don't even try to count or understand your code as it is a so unreable and messy expression.
If you a raw unsimplified expression like yours, throw it in the computer and ask why it doesn't accelerate, you need to think a much more lower level how it works.
Mohammad Shojaei Arani
Mohammad Shojaei Arani 2022 年 7 月 18 日
Bruno,
Of course, simplification matters a lot. My actual expressions are way longer than this. Matlab is not able to simplify them in an efficint way (and in many cases it simplifies a little). I have spent a lot of time on how to simplify my expressions. Unfortunately, using matlab I do not have any hope to simplify my expressions more than this (yes, you can perhaps simplify this expression more because it is not extremely long but can you do it for an expression which is 1KM long???) My expressions are in rational form. So, typically I perform 2 operations to simplify them: 1) first I apply [n,d] = numden(EZ), and then 2) EZ = horner(n,Ds0)./horner(d,Ds0). Unfortunately, matlab does support a multivariate horner scheme and I could only benifit the univariate horner scheme here (I apply horner scheme with respect to variable Ds0 as it is the most repeated variable. Typically, you should apply horner scheme with respect to such variables). So, at this point I convinced myself that I canot hope to simplify my expressions more using matlab. Therefore, I should find strategies to ask C or C++ to perform the calculations.
So, my question is not about how to come up with a better simplification (as it does not work with the current capacities of matlab). My question is "how can I use C/C++ or perhaps resort to stuff like gpuArray, etc to reduce the computational burden".

サインインしてコメントする。

採用された回答

Jan
Jan 2022 年 7 月 18 日
Just some experiments. You can gain some clarity, but hardly improve the speed with this simplifications. I've tried a loop version also.
n = 1e4;
Ds0 = rand(1, n);
Ds1 = rand(1, n);
Ds2 = rand(1, n);
Dm0 = rand(1, n);
Dm1 = rand(1, n);
Dm2 = rand(1, n);
tic;
for rep = 1:1e4
EZ = [(1.0./Ds0.^2.*(Ds0.*(Dm0.*4.0+Dm0.*Ds1.^2.*2.0-Ds0.*(Ds1.*2.0+Dm0.*Ds2.*2.0+Dm1.*Ds1.*2.0-Dm2.*Ds0)+Dm0.*Dm1.*2.0)-Dm0.^2.*Ds1.*2.0))./4.0;
(1.0./Ds0.^2.*(Ds0.*(Ds0.*(Dm1.*4.0+Ds1.^2-Ds0.*Ds2.*2.0+4.0)-Dm0.*Ds1.*8.0)+Dm0.^2.*4.0))./4.0;
(Dm0.*6.0-Ds0.*Ds1.*3.0)./(Ds0.*2.0)];
end
toc
Elapsed time is 0.794406 seconds.
tic;
for rep = 1:1e4
Ds0_2 = Ds0 .* Ds0;
Dm0_2 = Dm0 .* Dm0;
EZ2 = [(1 ./ Ds0_2 .* (Ds0 .* (Dm0 * 2 + Dm0 .* Ds1 .^ 2 - ...
Ds0 .* (Ds1 + Dm0 .* Ds2 + Dm1 .* Ds1 - Dm2 .* Ds0 ./ 2) + ...
Dm0 .* Dm1) - Dm0_2 .* Ds1)) / 2; ...
1 ./ Ds0_2 .* (Ds0 .* (Ds0 .* (Dm1 + Ds1 .^ 2 / 4 - Ds0 .* Ds2 / 2 + 1) - ...
Dm0 .* Ds1 * 2) + Dm0_2);
(Dm0 * 3 - Ds0 .* Ds1 * 1.5) ./ Ds0];
end
toc
Elapsed time is 0.775081 seconds.
tic;
for rep = 1:1e4
EZ3 = zeros(3, n);
for k = 1:n
a = Ds0(k);
b = Dm0(k);
c = Ds1(k);
d = Dm1(k);
e = Ds2(k);
EZ3(1, k) = (1 / a^2 * (a * (b * 2 + b * c ^ 2 - ...
a * (c + b * e + d * c - Dm2(k) * a / 2) + b * d) - b^2 * c)) / 2;
EZ3(2, k) = (a * (a * (d + c ^ 2 / 4 - a * e / 2 + 1) - b * c * 2) + b^2) / a^2;
EZ3(3, k) = b * 3 / a - c * 1.5;
end
end
toc
Elapsed time is 1.140882 seconds.
max(abs(EZ(:) - EZ2(:)))
ans = 0
max(abs(EZ(:) - EZ3(:)))
ans = 2.3283e-10
  5 件のコメント
Jan
Jan 2022 年 7 月 18 日
@Mohammad Shojaei Arani: The rules are straight:
  1. Avoid repeated work. If a calculation appears repeatedly, compute it once and store it in a temporary variable.
  2. Reduce the call to expensive functions: exp, power, trigonometric functions, faculty, ...
  3. Combine operations, but keep in mind, that the result can be influenced by rounding effects. E.g. 1/a*b takes more time than b/a, but the result can be slightly different.
The clarity of the code improves the time needed for debugging:
  1. Spaces around operators.
  2. Compact names of variables.
  3. Be careful with using parentheses, if they are not required.
  4. Avoid elementwise operators, if the calculation does not need it. 3.0.*2.0 is harder to read then 3 * 2.
Bruno's point is important: The result of numerically instable functions can be influenced massively by simplifications. A basic example:
1e17 + 1 - 1e17
ans = 0
1e17 - 1e17 + 1
ans = 1
Mohammad Shojaei Arani
Mohammad Shojaei Arani 2022 年 7 月 19 日
Thanks a lot Jan and Bruno!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeGPU Computing についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by