Regression with several dummy variables

11 ビュー (過去 30 日間)
Maria 2014 年 8 月 17 日

I have a cell type variable with 20000 rows and 700 columns. I present here an example of the first 9 columns:
C1 C2 C3 C4 C5 C6 C7 C8 C9
A={ 0 0 0 13 16 11 17 26 12 %row 1 is irrelevant
12 0 0 1 0 0 0 0 0
13 0 0 0 1 0 0 0 0
16 0 0 0 0 1 0 0 0
18 0 0 0 0 0 1 0 0
26 0 0 1 0 0 0 0 0
41 0 0 0 0 0 0 1 0}
I am trying to perform a regression.
C1 is simply and ID code; C2 is my binary dependent variable y. C3 is a dummy variable x (the elements, 0 or 1, are numbers), whose coefficient β (and if possible standard deviation) I want to interpret. From C4 onwards I have dummy variables (here the elements, 0 or 1, are logicals) that I also want to include in my regression to control for certain effects.
I most likely should use fitlm or regress functions but I am not being successful. Can someone help me? Thank you very much.
2 件のコメント表示非表示 1 件の古いコメント
Maria 2014 年 8 月 17 日
The large numbers did come from fewer variables but of different levels.

サインインしてコメントする。

回答 (1 件)

dpb 2014 年 8 月 17 日

Given the response to the previous question, should be just
y=A{1}(2:end,2); % y response variable
x=A{1}{2:end,3:end}; x=[ones(size(x,1),1 x]; % predictor variables plus constant term
[b,bint,~,~,stats] = regress(y,x);
As said, all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
6 件のコメント表示非表示 5 件の古いコメント
dpb 2014 年 8 月 18 日
Yep...as suspected would be the case given the number of dummy variables, at least one column is the same as another. It'll be very difficult to find an encoding that won't lead to the problem I'd guess.
You can always try
rank(x)
to get an estimate of how many problems you have...
I repeat the final synopsis from my initial answer --
...all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
It's that last phrase about being independent that's the rub.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by