Binary Logistic Regression - beginner

Hello,
Begginer question regarding logstic regressions in matlab. I am trying to run a binary logistic regression in Matlab but cannot seem to find the code to be able to do so. I am finding it for multinomial but not for binary. Could someone please lead me in the right direction?
I have a table of more than 60 variables and 200 000 rows that I want to run.
Thank you for your time and help!

 採用された回答

the cyclist
the cyclist 2021 年 5 月 27 日

1 投票

If you have the Statistics and Machine Learning Toolbox, you can use the fitglm function to fit a binomial logistic regression. See the first example on that page.

8 件のコメント

Amina Ag
Amina Ag 2021 年 5 月 28 日
編集済み: Amina Ag 2021 年 5 月 28 日
Thank you for your reply! I do not have a linear datasset and will need to regress a non-linear logistic. According to the page it will only fit a linear regression model? Is there something I have to change to make it fit a non-linear logistic?
the cyclist
the cyclist 2021 年 5 月 28 日
fitglm will fit many different types of models. That is why it is "generalized".
Did you look at the first example on the documentation page, as I suggested? That model is a binary logistic regression, exactly as you describe.
People often get confused by the terminology "linear". Linear refers to the fact that fitting equation will be linear in the coefficients. The curve itself is not (necessarily) linear.
Here is a simplified version of that first example, showing a logistic regression for Weight vs. Smoker, and the fit. (I could also have included many more variables, and interaction terms, but I wanted something simple to calculate and plot.)
load hospital
dsa = hospital(:,[4 5]);
modelspec = 'Smoker ~ Weight';
mdl = fitglm(dsa,modelspec,'Distribution','binomial')
mdl =
Generalized linear regression model: logit(Smoker) ~ 1 + Weight Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ _________ _______ ________ (Intercept) -3.3983 1.3198 -2.5749 0.010027 Weight 0.017541 0.0082503 2.1261 0.033495 100 observations, 98 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 4.68, p-value = 0.0304
w = (50:250)';
figure
hold on
scatter(dsa.Weight,dsa.Smoker)
plot(w,predict(mdl,w))
Amina Ag
Amina Ag 2021 年 5 月 28 日
Thank you for your reply, it was of great help! See that I will need to get more into the actual model to understand the inputs and ouputs.
the cyclist
the cyclist 2021 年 5 月 28 日
Happy to help.
(In answer to your emailed question: No, sorry, I only do help via this forum, not privately.)
Amina Ag
Amina Ag 2021 年 5 月 28 日
Fully understand and thank you again! I have tried to run the code but seems like it does not work for too many variables. Is that the case or am I misreading the error message?
The code I am trying to run:
model = fitglm(Main5,'bankrupty ~ ROE+ROA+CR+L2+P3+P2+P1+Quick_ratio+TBTA+L1+Debt1+CA+CLTA+QATTA+CATS+I_S+OI_TA+NI_S+LTDTT+S1+TDE+DTE+CCL+WC_S+EQ+NI_D+QA_S+STI+WCE+Size','link','logit','Distribution','binomial');
I am trying to run 30 something variables on the right hand side and get the following error message:
Error using glmfit (line 211)
The value of X must not be complex.
Error in GeneralizedLinearModel/fitter (line 659)
glmfit(model.design_r,response_r,model.DistributionName,'EstDisp',estDisp, ...
Error in classreg.regr.FitObject/doFit (line 94)
model = fitter(model);
Error in GeneralizedLinearModel.fit (line 973)
model = doFit(model);
Error in fitglm (line 146)
model = GeneralizedLinearModel.fit(X,varargin{:});
the cyclist
the cyclist 2021 年 5 月 28 日
That error does not imply that there are too many variables. It's stating that one of your explanatory variables is complex-valued, not real-valued. You should inspect your input values. You could narrow it down by looking at just a few variables at a time ...
model = fitglm(Main5,'bankrupty ~ ROE+ROA+CR+L2+P3+P2+P1','link','logit','Distribution','binomial');
Amina Ag
Amina Ag 2021 年 5 月 29 日
That worked perfectly and I found the guilty variable but I am not quite sure why it is not working? The varibale is the log of assets. If I want to include this variable how can i transform it so that the model can read it?
I also have 5 categorical variables that I want to include. Is it possible to include these without making them dummyvar? I know STATA reads it as an categorical automatically and it is not necessary to make a dummy, would it be possible to do the same in matlab?
the cyclist
the cyclist 2021 年 5 月 29 日
If the variable is the log of assets, I'll wager that what has happened is that one of your asset values is unexpectedly negative. The log of a negative number is complex.
Take a look at the Name-Value Pair Arguments section of the fitglm documentation. There is a CategoricalVars input that allows you to specify which of your explanatory variables is categorical. Alternatively, you could convert the variable to categorical before entering it into the model.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

質問済み:

2021 年 5 月 27 日

コメント済み:

2021 年 5 月 29 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by