how to perform ols regression using combinations of independent variable?

5 ビュー (過去 30 日間)
Zhen
Zhen 2014 年 11 月 28 日
編集済み: Matt J 2014 年 11 月 29 日
Hi!
I have been struggling for a while with the following problem.
Suppose we have y as a dependent variable and x1,...,xn as exogenous variables (n>7).
What I want to do is try to see which combination of exogenous variables gives best fit for y ...
So, if we have, for example, 3 exogenous variables, I would like to see which of the following regressions is best for fitting y (assuming that I know what statistic I will be using to discriminate between a "good" model from a "bad one"):
y~x1 ;
y~x2 ;
y~x3 ;
y~x1+x2 ;
y~x1+x3 ;
y~x2+x3 ;
y~x1+x2+x3
For only 3 variables, it is not that complicated (2^3-1 possibilities). The problem appears when I begin introducing more and more exogenous variables (2^7-1 = 127). How can I do it (somehow automatically) for all combinations when number of exogenous is big (>7)?
Thanks for your help!
Cheers!

回答 (3 件)

Image Analyst
Image Analyst 2014 年 11 月 29 日
Why not just use all of them and let the regression figure out how to weight the different xn?
y = alpha0 + alpha1 * x1 + alpha2 * x2 + alpha3 * x3
You can't use polyfit() but you can use the standard least squares formula
alpha = inv(x' * x) * x' * y; % Get estimate of the alphas.
Where x = an N rows by 4 columns matrix.
1, x1(1), x2(1), x3(1)
1, x1(2), x2(2), x3(2)
1, x1(3), x2(3), x3(3)
1, x1(4), x2(4), x3(4)
...
1, x1(N), x2(N), x3(N)
If one of the xn is not a good predictor, it should have a small alpha weight.
  1 件のコメント
Matt J
Matt J 2014 年 11 月 29 日
編集済み: Matt J 2014 年 11 月 29 日
You can't use polyfit() but you can use the standard least squares formula
No, don't do that. Just do
alpha=x\y;
for better conditioning. However, I assume that the OP's case is really more complicated, and that the x matrix does not have full column rank.

サインインしてコメントする。


Star Strider
Star Strider 2014 年 11 月 29 日
You are describing a stepwise multiple linear regression. It is a well-known, established technique, and the statistical procedure for adding and removing variables to get the best fit is not trivial.
If you have the Statistics Toolbox, see the documentation for Stepwise Regression and specifically stepwiselm, stepwise, and stepwisefit.
With 127 variables, and especially if you have a large data set, it is going to take some time. Have something else to do for a few minutes while the regression runs.

Matt J
Matt J 2014 年 11 月 29 日
編集済み: Matt J 2014 年 11 月 29 日
As ImageAnalyst says, performing an OLS regression with the entire data set should give you the unique best regression in one step, unless your x1,...,xn are over-complete.
If they are over-complete, and you are looking for the sparsest solution, the Matching Pursuit algorithm seems to be the standard alternative to an exhaustive search. There are several implementations on the File Exchange, but I've never used any of them:
Also, the solution is not guaranteed to be globally sparsest - the price paid for not doing an exhaustive search, it seems.

カテゴリ

Help Center および File ExchangeLinear and Nonlinear Regression についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by