How do I fit a regression equation to find coefficients and exponents?
17 ビュー (過去 30 日間)
古いコメントを表示
I'm currently working with a wide dataset, where I'm using a machine learning technique to select key identifiers (SIMPLS partial least squares). Then I want to use the identifiers and my outcome to create a predictive equation. I've tried a bunch of linear regression tools but they only find the predictor's coefficents, where I am trying to find the coefficients and the exponents. To get around this I'm trying to use 'nlinfit' to force the final equation into the desired form. This is where I'm having an issue, when I run the code I get the following error:
"The function you provided as the MODELFUN input has returned Inf or NaN values."
I've also tried inputting the model in the following form:
modelfun = 'y~(b1 + b2*x1.^b3 + b4*x2.^b5 + b6*x3.^b7)';
For reference my current data set is a 13x16 matrix, at it's largest it will be 24x35 matrix where the last column represents the outcome. Once the variable selection is complete (works without issue) the matrix is reduced to an nx4 matrix
Here is my code:
clear
clc
% Imports data and removed first text column
data = readtable('PMHS PLS Practice.xlsx',"textType","string");
data.SpecimenID = [];
% Splits data into independant and dependant
% variables and normalizes values
X = data(:,1:15);
X = X{:,:};
Xnorm = normalize(X);
Y = data(:,16);
Y = table2array(Y);
Ynorm = normalize(Y);
% Performs Simpls PLS on normalized data returning
% the X scores and percent varience per variable
% for the first 5 latent variables
Cpart = cvpartition(13,"LeaveOut");
[~,~,SCR,~,~,PCTVAR,~,~] = plsregress(Xnorm,Ynorm,5,'cv',Cpart);
X_VAR = PCTVAR(1,:);
Y_VAR = PCTVAR(2,:);
% pareto(X_VAR)
% finds the dependant variable with the largest
% contribution to the first 3 latent variables
[Var1,id1] = max(abs(SCR(:,1)));
[Var2,id2] = max(abs(SCR(:,2)));
[Var3,id3] = max(abs(SCR(:,3)));
% creates a matrix containing the selected variables
X_reg = [data{:,id1} data{:,id2} data{:,id3}];
% Fits the data with a non-linear model with
% initial coefficient guesses of beta0
modelfun = @(b,x) (b(1)+b(2)*x(:,1).^b(3)+b(4)*x(:,2).^b(5)+b(6)*x(:,3).^b(7));
beta0 = ones(1,7);
[coeff] = nlinfit(X_reg,Y,modelfun,beta0);
19 件のコメント
Matt J
2022 年 11 月 17 日
I can't provide my raw data because it has identifiable donor health information.
But why 2 .mat files instead of 1.
採用された回答
Matt J
2022 年 11 月 16 日
編集済み: Matt J
2022 年 11 月 16 日
fminspleas from the File Exchange (Download) fared better, but it looks like a highly ill-conditioned data set:
load('X_reg.mat')
load('Y.mat')
flist={1, @(b,x)x(:,1).^b(1), @(b,x)x(:,2).^b(2) , @(b,x)x(:,3).^b(3)};
[exps,coef]=fminspleas(flist,ones(1,3),X_reg,Y);
[b1,b2,b4,b6]=dealThem(coef)
[b3,b5,b7]=dealThem(exps)
function varargout=dealThem(z)
varargout=num2cell(z);
end
3 件のコメント
Matt J
2022 年 11 月 18 日
Weirdly enough, I tried a few different things on fminspleas (weighted and unweighted) with the same data that I provided and I always got a different answer than you.
Matt J
2022 年 11 月 20 日
編集済み: Matt J
2022 年 11 月 20 日
If you pre-normalize the columns of X_reg, fminspleas gives the same results as Alex and my conditioning test shows a much better condition number on the solution for the coefficients:
load('X_reg.mat')
load('Y.mat')
flist={1, @(b,x)x(:,1).^b(1), @(b,x)x(:,2).^b(2) , @(b,x)x(:,3).^b(3)};
s=max(X_reg);
n=size(Y,1);
[exponents,coefficients]=fminspleas(flist,ones(1,3),X_reg./s,Y);
coefficients(2:end)=coefficients(2:end)./(s.^exponents)';
format longG
exponents,coefficients
A=(X_reg./s).^exponents; A=[ones(n,1),A];
cond(A)
その他の回答 (1 件)
Alex Sha
2022 年 11 月 17 日
Although the results may seem strange, mathematically speaking, the result below is the best one:
Sum Squared Error (SSE): 875229.002284955
Root of Mean Square Error (RMSE): 259.47120816783
Correlation Coef. (R): 0.945217846153423
R-Square: 0.893436776686916
Parameter Best Estimate Std. Deviation Confidence Bounds[95%]
--------- ------------- -------------- --------------------------------
b1 12988.1117515585 6.29998367140824 [12972.6962468509, 13003.5272562661]
b2 1.533510681096E126 0.070612379218076 [1.533510681096E126, 1.533510681096E126]
b3 109.691046465045 0.167406667907439 [109.281417105381, 110.100675824708]
b4 -97.0000950427331 1048.74886499901 [-2663.19612168365, 2469.19593159819]
b5 -3.07105087006786 14192.7795542942 [-34731.5515429606, 34725.4094412205]
b6 -9.2988265393269E18 1.40462270054153E-15 [-9.2988265393269E18, -9.2988265393269E18]
b7 12.1425640359073 3.30305643552503E-124 [12.1425640359073, 12.1425640359073]
6 件のコメント
Matt J
2022 年 11 月 20 日
If there are various solutions as you said, what are the objective function values (SSE)
Probably very similar to what you got. My solution may be local and your solution may be global, but that does not mean the global solution is unique.
Alex Sha
2022 年 11 月 20 日
If possible, would you please be kind to show me one more global solution other than mine? so I can make some comparisons and find out why.
参考
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!