Probit: removing groups that perfectly predict failures

7 ビュー (過去 30 日間)
Tian
Tian 2021 年 7 月 29 日
回答済み: Kumar Pallav 2021 年 8 月 4 日
Hi all,
I have a group-year panel data as attahced. Apologies the data is very low quality.
There are 3 groups, each has 20 observations. Outcome y is a dummy variable for success. The first column in x is a continuous variable "effort". The second column is a dummy indicates group A. The third column is a dummy for group B. There is no dummy for group C to avoid collinearity.
I want to predict the probability of success using the probit model. The code I try is:
b = glmfit(x,y,'binomial','Link','probit');
b =
0.1857 (constant)
-1.8149 (effort)
-16.1148 (group A)
-16.2994 (group B)
As you can see in the data, all outcomes for group A are failures. So the second column in x predicts y == 0 perfectly. Matlab also raises a warning:
Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.
For the fitted linear combination XB of the predictors, the sample proportions P of Y=N in the data satisfy:
XB<-0.834093: P=0
XB=-0.834093: P=1
XB>-0.834093: P=0
However, it still returns an estimated coefficient for group A dummy, which is b(3) = -16.1148.
Question:
Since x(:,2) perfectly predict failures, b(3) should be 0. Is there an option in glmfit to remove observations for group A within glmfit function, then return the coefficient as 0 for this column? So I can get something like:
b =
0.1857 (constant)
-1.8149 (effort)
0 (group A)
xxx (group B)
Stata does this automatically using the command:
probit y effort i.group
It turns out the estiamtes for the constant and effort are the same. So the perfect failure issue only affects the group dummies coefficients...
Thank you!!!

採用された回答

Kumar Pallav
Kumar Pallav 2021 年 8 月 4 日
From my understanding ,for the coefficient vector b, you expect the b(3)=0 as you mentioned that the second column of x (group A dummies) are failures(that is 0). But , after inspecting the data, I see that the second column of x are not all zeros.
%check if any non-zero value in the vector
containsNonZero = any(x(:,2)) %returns 1 if true
However, if you change the values of second column of x to zero
%change second column values of x to zero
x(:,2)=0;
b = glmfit(x,y,'binomial','Link','probit')
Then, the b(3) value becomes 0.

その他の回答 (0 件)

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by