Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

Question

qfn 2022 年 1 月 28 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1638475-fixed-effects-design-matrix-must-be-of-full-column-rank-with-multiple-categorical-predictors

回答済み: Ive J 2022 年 1 月 29 日

I am probably doing something very dumb, however I cannot figure out my mistake.

I am trying to regress out some predictors from a data set -- I have two categorical predictors, A1 and A2 in a table, something like this:

It seems obvious to me that A1 and A2 are linearly independent. They are also linearly independent from the intercept, which I believe should be a categorical variable that looks like ones(1,11) ? But regardless, I want the global mean to not be removed from everything, so I don't include an intercept in the model.

Then, if I run something like this:

lme = fitlme('values ~ A1 + A2 -1, 'DummyVarCoding','full' )

I always get the same error :

Error using classreg.regr.lmeutils.StandardLinearLikeMixedModel/validateInputs (line 229)

Fixed Effects design matrix X must be of full column rank.

I don't understand why this is happening -- and probably this shows that I have a pretty big misunderstanding of what the dummy variables actually are.

However, if I run two fitlme's -- one on the subset A1==1 and one on A1==0, they both work, which just super confuses me.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Ive J 2022 年 1 月 29 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1638475-fixed-effects-design-matrix-must-be-of-full-column-rank-with-multiple-categorical-predictors#answer_884350

MATLAB Online で開く

The error is self-explanatory, and the reason is full dummy variable scheme you're using (why?). See here https://mathworks.com/help/stats/dummy-indicator-variables.html

Note that the error has nothing to do with mixed-model design. Consider this example:

n = 100; % sample size
tab = table(randn(n,1), categorical(randi([0 1], n, 1)), ...
    categorical(randi([0, 1], n, 1)),...
    'VariableNames', {'value', 'A1', 'A2'});
mdl1 = fitlm(tab, 'value ~ A1 + A2 - 1', 'DummyVarCoding', 'full') % design matrix is rank deficient
Warning: Regression design matrix is rank deficient to within machine precision.
mdl1 = 
Linear regression model:
    value ~ A1 + A2

Estimated Coefficients:
            Estimate       SE        tStat      pValue 
            _________    _______    ________    _______

    A1_0     -0.20234    0.20399    -0.99191    0.32373
    A1_1            0          0         NaN        NaN
    A2_0    -0.045804    0.17202    -0.26627     0.7906
    A2_1     0.097693    0.18145     0.53839    0.59155


Number of observations: 100, Error degrees of freedom: 97
Root Mean Squared Error: 1.02
R-squared: 0.0145,  Adjusted R-Squared: -0.00585
F-statistic vs. constant model: 0.712, p-value = 0.493

So, what happened? Let's construct the design matrix:

X = [dummyvar(tab.A1), dummyvar(tab.A2)]; % DummyVarCoding -> full
disp(rank(X)) % 3 < size(X, 2) --> 3 < 4  --> rank deficient
     3
% what about when considering them alone?
disp(rank(X(:, 1:2))) % full rank
     2
disp(rank(X(:, 3:4))) % full rank
     2

We can approximately find the problematic variable:

[~, R] = qr(X, 0);
find(abs(diag(R)) < 1e-6)
ans = 4

Therefore, don't set 'DummyVarCoding' in such cases (default is 'reference')

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示