regstats The design matrix has more predictor variables than observations.

3 ビュー (過去 30 日間)
King To Leung
King To Leung 2022 年 7 月 31 日
回答済み: Walter Roberson 2022 年 7 月 31 日
I used the following code to run a regression, the system shows
Error using regstats (line 132)
The design matrix has more predictor variables than observations.
My codes:
fm_betas=NaN(length(ud),4); % 4 columns for the constant term, size, bm, pe
for i=1:length(ud) % We run a regression for each time period
tdata=data_crsp(data_crsp(:,c.date)==ud(i),:);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
reg_results = regstats(tdata(:,c.fut_ret), [log(tdata(:,c.cap)), log(tdata(:,c.bm)), tdata(:,c.pe)], 'linear', {'beta'});
fm_betas(i,:)=reg_results.beta';
end
mean(fm_betas)
% ud=unique(data_crsp(:,c.date)); %data_crsp is the data set
% I have checked there is not infinite no. in the data

回答 (2 件)

dpb
dpb 2022 年 7 月 31 日
The problem is NOT that there are NaN or Inf in the data (although that could also be a cause since they're treated as missing values), the problem is as the error message says -- by the time you've selected the subset of data for one or more of your time periods, the resulting height(tdata) < 4, the number of coefficients you're trying to estimate (3 independent plus 1 intercept).
"You can't do that!" -- you'll have to only fit over periods that have at least that many points; it would be far better to have well more than that.
You'll have to dig into the data set and see where either your selection logic isn't doing what you think or find groupings that have sufficient data in them; we can't see the data...

Walter Roberson
Walter Roberson 2022 年 7 月 31 日
reg_results = regstats(tdata(:,c.fut_ret), [log(tdata(:,c.cap)), log(tdata(:,c.bm)), tdata(:,c.pe)], 'linear', {'beta'});
You are providing three prediction variables and one result variable, and you are not providing a type of model, so you default to linear. You are trying to find three linear coefficients, one for each of the three variables. Your calculation is effectively
[log(tdata(:,c.cap)), log(tdata(:,c.bm)), tdata(:,c.pe)] \ tdata(:,c.fut_ret)
In order to do that, you need at least three rows of input.
tdata=data_crsp(data_crsp(:,c.date)==ud(i),:);
What happpens if there are only 1 or 2 rows found by that test ?

カテゴリ

Help Center および File ExchangeModel Building and Assessment についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by