Performance of log() is wildly different in two different contexts and machines for same data?

2 ビュー (過去 30 日間)
Lakshay Sood
Lakshay Sood 2021 年 12 月 24 日
コメント済み: Lakshay Sood 2021 年 12 月 31 日
I am writing code to fit 2D Gaussian functions to white spots on a black image background for a research project. I recently got a new 16 GB RAM 2020 MacBook Pro and was comparing its performance to my old 8 GB RAM 2020 MacBook Pro. When profiling my code on the 16 GB machine, I saw that calling the log() function 10,000 times as part of
took approximately 1.5 seconds per data point. This came as a surprise to me, as running the same code on the 8 GB machine takes approximately 0.01 seconds per data point. The net result is that the overall code takes approximately 6 times longer to run on the 16 GB machine which is unacceptable for my purposes.
Next I checked the CPU load on both machines, reasoning that log() might not be parallelized for some reason on the 16 GB machine, but both machines have very similar CPU load while running the code. I also reinstalled MATLAB on the 16 GB machine with the same toolboxes to see if there was some issue in how I had transferred data from the 8 GB machine to the 16 GB machine, but that did not help.
Each data point is a 9 x 9 (edit: 7 x 7) array of doubles. To investigate further I moved the offending code into a new script on the 16 GB machine and saved a single data point in the workspace. I also added a for loop beneath where I took the log of the data in the matrix 10,000 times. This reproduced the results seen above. (Edit for clarity - I mean that taking the log of the data in the matrix in the for loop is taking the expected 0.01 seconds. It is only in the optimization routine on the 16 GB machine that I'm seeing 1.5 second/log() call runtimes.) So the performance of log() varies reproducibly even on the same machine.
The code in question is below. The installed toolboxes on both machines are identical, except that on the 8 GB machine I have installed the Symbolic Math and Computer Vision toolboxes, while on the 16 GB machine I have installed the Wavelet and Optimization toolboxes. Otherwise both machines have the Bioinformatics, Curve Fitting, Image Processing, and Stats/Machine Learning toolboxes. Both machines are running R2019a (I needed my code to be compatible with older versions of MATLAB and I haven't experienced any other issues so far). Might anyone know what the issue is, and how I could fix it?
Offending code:
x = 491;
y = 16;
circle_rad = 3;
peakdiam = 7;
mask = [15 12 13 13 14 13 11;
12 13 12 12 13 16 15;
15 16 15 14 16 17 13;
21 19 27 40 30 19 11;
15 16 26 31 19 19 14;
12 11 18 30 19 12 14;
13 14 15 14 13 13 13;]
mask = 7×7
15 12 13 13 14 13 11 12 13 12 12 13 16 15 15 16 15 14 16 17 13 21 19 27 40 30 19 11 15 16 26 31 19 19 14 12 11 18 30 19 12 14 13 14 15 14 13 13 13
MLE_initial_guess = [circle_rad + 1, circle_rad + 1, peakdiam, sum(mask(:)), min(mask(:))];
neg_log_likelihood_fxn = @(params) NegLogFxn(params, mask, peakdiam) ;
options = optimset ('MaxFunEvals', 10000, 'MaxIter', 10000, 'TolFun', 1e-5);
MLE_output = fminsearch(neg_log_likelihood_fxn, MLE_initial_guess, options);
pixel_offset = [y - circle_rad - 1, x - circle_rad - 1]; % -1 corrects for the +1 offset in the initial guess
MLE_output(1:2) = MLE_output(1:2) + pixel_offset;
fprintf('Y-COORDINATE;\nMLE: %f \nX-COORDINATE:\nMLE: %f \n\n', MLE_output(1), MLE_output(2))
Y-COORDINATE; MLE: 16.610009 X-COORDINATE: MLE: 490.979523
function neglog = NegLogFxn(params, data, array_size)
tdg = twoD_Gaussian(params, array_size);
% disp(tdg)
% neglog = -tdg;
field_of_view = data;
tdg_log = log(tdg);
field_of_view = field_of_view .* tdg_log;
field_of_view = field_of_view - tdg;
neglog = -1 * sum(field_of_view(:));
function psf = twoD_Gaussian(params, array_size)
[x, y] = meshgrid(1:array_size, 1:array_size);
psf = params(4)/(2 * pi * params(3)^2);
psf = psf * exp(-((x - params(2)).^2 + (y - params(1)).^2)/(2 * params(3)^2));
psf = psf + params(5);
Thanks in advance!
  17 件のコメント
Lakshay Sood
Lakshay Sood 2021 年 12 月 30 日
After running my code on my datasets, I'm noticing that mask arrays with lots of 0s or negative numbers tend to be giving the 16 GB machine trouble. As an example, replacing the entire top row of the mask array in this question with 0 reproduced the problem I've been describing, but if I use another data point with all positive numbers, or even with some negative numbers, the problem disappears on the 16 GB machine. @Jan, in the second for loop of CuteTest, I changed this to
mask = ones(16) * 16;
for i = 1:10000
mask(1, :) = mask(1, :) - 1;
q = log(mask);
elapsedTime = toc;
if elapsedTime > 0.0001
As expected, the first number I reliably saw was i = 17, and I saw every i from 17 to 10,000. I hope this helps.



Matt J
Matt J 2021 年 12 月 30 日
編集済み: Matt J 2021 年 12 月 30 日
I'm noticing that mask arrays with lots of 0s or negative numbers tend to be giving the 16 GB machine trouble.
Because your model is Poisson, you should be requiring that your psf model take positive values only, which means that the additive term params(5) must be non-negative.. If you were using fmincon, you could specify this constraint explicitly. With fminsearch, you cannot apply constraints directly, but a workaround is to use a change of variables, such that params(5) becomes params(5)^2,
psf = psf + params(5)^2;
Naturally, of course, you must modify MLE_output accordingly,
MLE_output = fminsearch(neg_log_likelihood_fxn, MLE_initial_guess, options);
  3 件のコメント
Lakshay Sood
Lakshay Sood 2021 年 12 月 31 日
params(4) is the sum of all of the values in the array, and those are strictly greater than or equal to 0, so I've never had to deal with an issue like that to be honest. But I have noticed that for very poor fits params(4) comes out negative. Usually if the fit is that bad I just choose another start point and try to fit again.


その他の回答 (1 件)

Jan 2021 年 12 月 30 日
On the 16 GB M1 MacBook, the lines
tdg_log = log(tdg);
displays, that the processing needs 1.5 seconds. Using the same data, a loop with 10'000 iterations takes 0.008 seconds:
for i = 1:1:10000
This means, that the 8GB Intel MacBook is 1'875'000 times faster for this operation.
There are some reasonable ideas to explain this:
  • The Matlab version you use has a massive problem with the M1 CPU.
  • The profiler is completely confused. (But the total run-time seems to exclude this option)
  • You do not run the same code. Matt J found out, that the code posted here in the forum does not run at all, but stops with an error.
A speed difference of the factor of almost 2 million is extremely unlikely and cannot be caused by a problem with the multithreading. For tiny 9x9 matrices starting mutliple threads would be a waste of time in all cases, so maybe the log() function does apply a multithreading on the M1 chip, but this would be strange. The Matlab functions of 2019 do not include special adjustments for the M1 chip, which was invented later.
By the way, the function twoD_Gaussian2 can be improved:
function psf = twoD_Gaussian2(params, array_size)
x = 1:array_size;
k = 2 * params(3)^2;
psf = params(4) / (k * pi) * ...
exp(-(x - params(2)).^2 / k) .* exp(-(x.' - params(1)).^2 / k) + ...
Using meshgrid to create an NxN matrix requires to calculate N*N expensive exp() functions. Using the vectors instead and the relation: exp(a+b) = exp(a)*exp(b) is cheaper. For a 9x9 matrix this is 4 times faster and the benefit grows quadratically with the array size.
  2 件のコメント
Jan 2021 年 12 月 30 日
I was confused by the names "8GB MacBook Pro 2020" and "16GB MacBook Pro M1".





Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by