Fixed Effects Design Matrix Must be of full column rank with multiple categorical predictors

62 ビュー (過去 30 日間)
qfn
qfn 2022 年 1 月 28 日
回答済み: Ive J 2022 年 1 月 29 日
I am probably doing something very dumb, however I cannot figure out my mistake.
I am trying to regress out some predictors from a data set -- I have two categorical predictors, A1 and A2 in a table, something like this:
It seems obvious to me that A1 and A2 are linearly independent. They are also linearly independent from the intercept, which I believe should be a categorical variable that looks like ones(1,11) ? But regardless, I want the global mean to not be removed from everything, so I don't include an intercept in the model.
Then, if I run something like this:
lme = fitlme('values ~ A1 + A2 -1, 'DummyVarCoding','full' )
I always get the same error :
Error using classreg.regr.lmeutils.StandardLinearLikeMixedModel/validateInputs (line 229)
Fixed Effects design matrix X must be of full column rank.
I don't understand why this is happening -- and probably this shows that I have a pretty big misunderstanding of what the dummy variables actually are.
However, if I run two fitlme's -- one on the subset A1==1 and one on A1==0, they both work, which just super confuses me.

回答 (1 件)

Ive J
Ive J 2022 年 1 月 29 日
The error is self-explanatory, and the reason is full dummy variable scheme you're using (why?). See here https://mathworks.com/help/stats/dummy-indicator-variables.html
Note that the error has nothing to do with mixed-model design. Consider this example:
n = 100; % sample size
tab = table(randn(n,1), categorical(randi([0 1], n, 1)), ...
categorical(randi([0, 1], n, 1)),...
'VariableNames', {'value', 'A1', 'A2'});
mdl1 = fitlm(tab, 'value ~ A1 + A2 - 1', 'DummyVarCoding', 'full') % design matrix is rank deficient
Warning: Regression design matrix is rank deficient to within machine precision.
mdl1 =
Linear regression model: value ~ A1 + A2 Estimated Coefficients: Estimate SE tStat pValue _________ _______ ________ _______ A1_0 -0.20234 0.20399 -0.99191 0.32373 A1_1 0 0 NaN NaN A2_0 -0.045804 0.17202 -0.26627 0.7906 A2_1 0.097693 0.18145 0.53839 0.59155 Number of observations: 100, Error degrees of freedom: 97 Root Mean Squared Error: 1.02 R-squared: 0.0145, Adjusted R-Squared: -0.00585 F-statistic vs. constant model: 0.712, p-value = 0.493
So, what happened? Let's construct the design matrix:
X = [dummyvar(tab.A1), dummyvar(tab.A2)]; % DummyVarCoding -> full
disp(rank(X)) % 3 < size(X, 2) --> 3 < 4 --> rank deficient
3
% what about when considering them alone?
disp(rank(X(:, 1:2))) % full rank
2
disp(rank(X(:, 3:4))) % full rank
2
We can approximately find the problematic variable:
[~, R] = qr(X, 0);
find(abs(diag(R)) < 1e-6)
ans = 4
Therefore, don't set 'DummyVarCoding' in such cases (default is 'reference')

カテゴリ

Help Center および File ExchangeMultiple Linear Regression についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by