Ezyfit: Single vs. Two-Step Fitting Problem

2 ビュー (過去 30 日間)
Sim
Sim 2025 年 3 月 13 日
編集済み: Sim 2025 年 3 月 17 日
Ezfit (actually I am using the updated version from Walter Roberson) fails to accurately determine all 4 parameters in a single curve fit (see Method A). A workaround, setting 1 parameter initially and then fitting the remaining 3 in a second step (Method B), looks instead successful.
Is there a method or setting within Ezfit to obtain accurate 4-parameter results in a single fitting run (Method A)?
% (1) Input Data
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
% (2) Method A: Perform a simple curve fitting with Ezfit
ft1 = ezfit(x,log(y),'log(a*x^(-b) + c*exp(-d*x))','MaxFunEvals', 1e6, 'MaxIter', 1e6); %, 'usega', usega);
array2table([ft1.m(1) ft1.m(2) ft1.m(3) ft1.m(4) (ft1.r)^2],'VariableNames',{'a','b','c','d','R2'})
% a b c d R2
% _____ ______ ______ _____ _______
%
% 56106 2.2236 -85026 49582 0.85991
% (3) Method B: Perform curve fitting by using Ezfit 2 times
% (3.1) With the first Ezfit, find the exponent "b" (among a range of values between 1.5 and 2.5)
% which maximise the coefficient of determination R^2 (I set it as R^2 = 0.99).
% The resulting exponent will be "b_range(idx_max_b) = 1.86".
i = 1;
b_range = 1.5 : 0.01 : 2.5;
for b = b_range
ft2 = ezfit(x,log(y),sprintf('log(a*x^(-%.2f) + c*exp(-d*x))',b),'options',optimset('MaxFunEvals', 1e6, 'MaxIter', 1e6));
ftr(i) = (ft2.r)^2;
i = i + 1;
end
idx_max_b = find(ftr == max(ftr(ftr < 0.99)));
% (3.2) Following the initial determination of the optimal exponent "b" (i.e. "b_range(idx_max_b)"),
% the second Ezfit step finds the remaining parameters
ft3 = ezfit(x,log(y),sprintf('log(a*x^(-%.2f) + c*exp(-d*x))',b_range(idx_max_b)),'options',optimset('MaxFunEvals', 1e6, 'MaxIter', 1e6));
array2table([ft3.m(1) b_range(idx_max_b) ft3.m(2) ft3.m(3) (ft3.r)^2],'VariableNames',{'a','b','c','d','R2'})
% a b c d R2
% ______ ____ _____ _________ _______
%
% 2084.4 1.86 4.806 0.0064196 0.98323
% (4) Plot
hold on
plot(x, y, 'o', 'MarkerFaceColor',[0.7 0.7 0.7],'DisplayName', 'Input Data');
xx = 10 : 10000;
plot(xx, ft1.m(1) .* xx.^(-ft1.m(2)) + ft1.m(3) .* exp(-ft1.m(4) .* xx), 'color','blue', 'Linewidth',2, 'DisplayName', 'Method A');
plot(xx, ft3.m(1) .* xx.^(-b_range(idx_max_b)) + ft3.m(2) .* exp(-ft3.m(3) .* xx), 'color', 'red', 'Linewidth',2, 'DisplayName', 'Method B')
xlabel('x');
ylabel('y');
legend('show');
set(gca, 'XScale', 'log', 'YScale', 'log')
  7 件のコメント
Torsten
Torsten 2025 年 3 月 14 日
編集済み: Torsten 2025 年 3 月 14 日
As said: you must decide whether you want to fit log(y) against log(x) using the function or y against x. Both will give different parameters for the fitting process. Usually, fitting y against x is intended.
Say you have the function y = a*x^b as fitting function. Then taking the log on both sides gives log(y) = log(a) + b*log(x), thus a function where log(y) depends linearily on log(x). The latter is numerically simpler, but the results for a and b will differ from the nonlinear fit of y against x.
Sim
Sim 2025 年 3 月 14 日
Thanks @Torsten for your clarifications, thanks :-)

サインインしてコメントする。

採用された回答

Walter Roberson
Walter Roberson 2025 年 3 月 13 日
Is there a method or setting within Ezfit to obtain accurate 4-parameter results in a single fitting run (Method A)?
No.
  8 件のコメント
Walter Roberson
Walter Roberson 2025 年 3 月 14 日
@Alex Sha used a third-party optimizer named 1stOpt. It is a somewhat expensive tool (about $US2000 per user), but from what I have seen, it produces excellent results much faster than competitors. I would consider buying it myself, but I cannot currently justify the price.
Sim
Sim 2025 年 3 月 17 日
Thanks @Walter Roberson for sharing! :-) I have checked it on the web, and it looks very interesting and promising (even though the academics might find the cost prohibitive if it is the same for them).

サインインしてコメントする。

その他の回答 (2 件)

Sam Chak
Sam Chak 2025 年 3 月 14 日
Hi @Sim
You can verify this. The model proposed by @Torsten fits quite well within the first 10% of the data range. However, after that point, the model decays much more slowly than the data, although, in theory, it will eventually converge to zero. Inspired by this model, it suggests that you should consider implementing a nonlinear power rate, as illustrated below. You can also find a better a nonlinear function, preferably a bounded one.
format long g
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef, xdata) xdata.^(-coef(1)*(xdata).^(0.4)), @(coef, xdata) exp(-coef(2)*xdata)};
NLPstart = [1, 1];
options = optimset('disp', 'none');
[INLP, ILP] = fminspleas(funlist, NLPstart, x, y, [], [], [], options)
Warning: Rank deficient, rank = 1, tol = 1.957112e+05.
Warning: Rank deficient, rank = 1, tol = 6.998085e-03.
Warning: Rank deficient, rank = 1, tol = 6.998085e-03.
INLP = 1×2
0.0594819972748915 3.50973243531023
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ILP = 2×1
1.0e+00 * 22.0279801300898 7302263919.12418
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = ILP(1)*xf.^(-INLP(1)*(xf).^(0.4)) + ILP(2)*exp(-INLP(2)*xf);
%% Plot results
figure
plot(x, y, 'o'), hold on
plot(xf, yf), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
set(gca, 'XScale', 'log','YScale', 'log')
figure
subplot(211)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([0 700]), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [0, 700]$'}, 'interpreter', 'latex')
subplot(212)
plot(x, y, 'o'), hold on
plot(xf, yf), xlim([700 7000]), grid on, hold off
legend('data', 'model')
xlabel('x'), ylabel('y')
title({'Displaying model fitness in interval $x \in [700, 7000]$'}, 'interpreter', 'latex')
  3 件のコメント
Sim
Sim 2025 年 3 月 14 日
編集済み: Sim 2025 年 3 月 14 日
Hi @Sam Chak, thanks for your nice answer :-)
I do not know how to get R-squared with "fminspleas", but it looks a nice fit!
It looks like you used a "power tower" or "tetration", plus the exponential decay... this makes things a bit more complicated :-) I was thinking that, by using the "power tower", probably, the exponential decay is not necessary anymore (?):
x = [6, 16, 26, 36, 46, 56, 66, 66, 76, 86, 96, 126, 176, 186, 246, 286, 306, 336, 366, 396, 426, 486, 496, 506, 546, 546, 616, 656, 726, 756, 806, 816, 836, 946, 966, 976, 1056, 1066, 1116, 1136, 1176, 1296, 1326, 1386, 1426, 1456, 1476, 1586, 1596, 1736, 1806, 1836, 1846, 1886, 2016, 2046, 2106, 2196, 2296, 2346, 2356, 2366, 2556, 2666, 2806, 2856, 2916, 2976, 3116, 3266, 3366, 3416, 3526, 3726, 3876, 3976, 4386, 4636, 4686, 5246, 5346, 6966];
y = [22.932, 13.063, 10.877, 8.7408, 7.4074, 5.8344, 12.693, 4.8185, 3.5444, 2.6637, 2.2515, 4.5284, 1.7597, 1.5207, 1.1754, 0.74499, 0.75637, 0.28157, 0.35862, 0.37542, 0.27759, 0.25008, 0.11389, 0.23569, 0.16777, 0.084717, 0.11865, 0.03168, 0.085227, 0.042803, 0.038632, 0.016028, 0.049869, 0.034689, 0.025157, 0.0064107, 0.017007, 0.0067062, 0.012242, 0.0065558, 0.014677, 0.0048147, 0.0048443, 0.0045972, 0.0059391, 0.0024968, 0.0050464, 0.0024622, 0.0053329, 0.0038574, 0.0035999, 0.0022632, 0.00147, 0.00098146, 0.0012814, 0.0024717, 0.00073639, 0.00073862, 0.0013448, 0.00088996, 0.00096546, 0.00084, 0.00062343, 0.0014789, 0.00050959, 0.00027922, 0.00055839, 0.0019602, 0.00098146, 0.00020941, 0.00028421, 0.00075468, 0.00031831, 0.00055524, 0.00055844, 0.00039789, 0.00031207, 0.00072048, 0.00021507, 0.0004718, 0.00018294, 0.00018086];
funlist = {@(coef, xdata) xdata.^(-coef(1)*(xdata).^(coef(2)))};
NLPstart = [1, 1];
options = optimset('disp', 'none');
[INLP, ILP] = fminspleas(funlist, NLPstart, x, y, [], [], [], options)
xf = linspace(x(1), x(end), numel(x)*1e3+1);
yf = ILP(1)*xf.^(-INLP(1)*(xf).^(INLP(2)));
hold on
plot(x, y, 'o')
plot(xf, yf)
set(gca, 'XScale', 'log','YScale', 'log')
However, the idea to use a "simple" power law and an exponential decay, would be supported by some theoretical argument... by using, instead, a power tower, the theoretical argument would be more complicated.... :-)
Sim
Sim 2025 年 3 月 14 日
編集済み: Sim 2025 年 3 月 14 日
Thanks @Sam Chak! I have just read your comment, just a second after I posted mine...! Yes, it could also be an exponential function-based model. I will look into it further. However, the sum of a simple power law and a simple exponential decay can be justified through a theoretical argument (in particular the power law term). If we use nonlinear power rates—whether power law-based or exponential-based one—it could make the results much harder to interpret.... :-)
...I can keep an eye on this webpage to see if @Alex Sha might be interested in adding more on this topic. By the way, there are some great fitting tools I wasn’t aware of, like "fminspleas", which looks quite impressive! :-)

サインインしてコメントする。


Alex Sha
Alex Sha 2025 年 3 月 15 日
移動済み: Sam Chak 2025 年 3 月 15 日
Hi, all, if add one more parameter to the original fitting function, i.e. y=ln(a*x^(-b)-c*exp(-d*x))*f, the result will be improved like:
Sum Squared Error (SSE): 62.0296923820521
Root of Mean Square Error (RMSE): 0.869746896054109
Correlation Coef. (R): 0.970745675455549
R-Square: 0.94234716641565
Parameter Best Estimate
--------- -------------
a 0.97506498591964
b -0.0034886080404169
c 1.00586174849318
d 0.0069594641193838
f -5.56698125787981
  4 件のコメント
Alex Sha
Alex Sha 2025 年 3 月 17 日
Not sure whether or not my understanding is correct, if want fitting result good in log() scale, is it reasonab to change y data to y1=log(y) firstly, then the fitting function become "y1=log(a*x^(-b)-c*exp(-d*x))", the result will be:
Sum Squared Error (SSE): 3.16256795387294
Root of Mean Square Error (RMSE): 0.196387122481336
Correlation Coef. (R): 0.991266174106857
R-Square: 0.982608627928446
Parameter Best Estimate
--------- -------------
a 10.6590120083535
b 0.69245173591868
c -1.65676470503943
d 0.00313457062827038
if taking function as "y1=f*log(a*x^(-b)-c*exp(-d*x))"
Sum Squared Error (SSE): 2.95997975085014
Root of Mean Square Error (RMSE): 0.189992931538933
Correlation Coef. (R): 0.99182732299325
R-Square: 0.983721438635958
Parameter Best Estimate
--------- -------------
f 4.78460068319797
a 0.918658120426212
b 0.0761834706279075
c -0.540969703020373
d 0.00132426521534454
Sim
Sim 2025 年 3 月 17 日
編集済み: Sim 2025 年 3 月 17 日
Thanks a lot @Alex Sha! Now it is clear, thanks! :-)
Therefore, we can get an R-square equal to 0.982608627928446, by using
y1 = log(a*x^(-b) - c*exp(-d*x))
and we can get an R-square equal to 0.983721438635958, by using
y1 = f * log(a*x^(-b) - c*exp(-d*x))
By multiplying by a factor "f", the fit improves further, but even without it, the fit is already quite good. :-)
Thanks a lot!

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by