fitrlinear for large data set

I am trying a large regression/lasso model with n=90000 rows and p=500 columns
[mhat,FitInfo]=fitrlinear(X,y,'Learner','leastsquares');
I tryied also additional parameters
'solve','sparsa'
'Regularization','lasso'
The problem is that, when X has 200 columns or more, all the elements of mhat.Beta are ZERO
Do you have any suggestion about that?
Thanks,
Alessandro

1 件のコメント

Alessandro Fassò
Alessandro Fassò 2021 年 2 月 25 日
Note that rank(X)>200

サインインしてコメントする。

回答 (1 件)

Aditya Patil
Aditya Patil 2021 年 3 月 29 日

0 投票

With high dimensional data, it is expected that some of the predictors won't have much effect on the response.
As a workaround, you can try to reduce the dimension using Dimensionality Reduction and Feature Extraction techniques.

2 件のコメント

Alessandro Fassò
Alessandro Fassò 2021 年 3 月 29 日
Thanks for your answer!
I agree that "some of the predictors won't have much effect ...", but I expect that others do have an effect (I know from preliminary correlation analysis and maller regression excercises).
Note that X has rank > 200.
The problem is that fitrlinear give me ALL the betas=0. It comes very fast despite the large dimension problem.
Of course one can perform some preliminary dimensionality reduction, but I expect this is made by the lasso option of fitrlinear, I tried in various exercises like
>> fitrlinear(..., 'regularization','lasso','lambda',lambda);
for various lambda.
Aditya Patil
Aditya Patil 2021 年 3 月 29 日
Can you provide the data so that I can reproduce the issue? Also provide the output of the version command.

サインインしてコメントする。

質問済み:

2021 年 2 月 25 日

コメント済み:

2021 年 3 月 29 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by