Why is LASSO in MATLAB so slow in the case of highly correlated predictors?

3 ビュー (過去 30 日間)
Marlis Hofer
Marlis Hofer 2015 年 12 月 1 日
編集済み: Ilya 2015 年 12 月 2 日
I am using LASSO based on 4-fold cross-validation in a regression problem. I observed that with an increasing number of predictors, the computation time for the MATLAB LASSO function increases dramatically, such that it becomes unfeasible for me (since I need to run the LASSO several 1000 times). E.g, for 100 predictors, LASSO needs mor than 60 sec. The same example in Python takes only few seconds. What could be the reason for such a difference in computation speed? ---added later: I observed that it is not the number of predictors that affects LASSO computation time, but the degree of colinearity in the predictors. MATLAB algorithm 'cDescentCycle' takes almost all the computation time. MATLAB help suggests using ELASTIC NET (set alpha < 1) in case of highly correlated predictors. ELASTIC NET is a bit faster,but is still unfeasible slow. I have not done further tests with LASSO implemented in python. I still don't know what to do to increase speed of LASSO in the case of highly correlated predictors (reducing the number of Lambda values or increasing the RelTol parameter does help only very little, ~few sec).

採用された回答

Ilya
Ilya 2015 年 12 月 1 日
There could be many reasons. The lasso function has a lot of flexibility, so make sure you are comparing apples and apples. To make it run faster, you could
  1. Use fewer values of lambda.
  2. Increase the relative tolerance.
  3. Try standardizing or not standardizing predictors.
  4. Try running in parallel if you have a Parallel Computing Toolbox license.
The function would still be likely slower than C/C++ or Fortran code.
  2 件のコメント
Marlis Hofer
Marlis Hofer 2015 年 12 月 2 日
編集済み: Marlis Hofer 2015 年 12 月 2 日
Thanks for your answer! I have already tried out different options of LASSO (e.g., increasing the RelTol one order of magnitude, decreasing NumLambda to 50, using the Parallel option). This helped to increase speed but only for a small fraction of the total run time, such that it is still too slow. I agree that I should not compare Python with MATLAB without specifying the exact options in each algorithm. However, I observed (as also updated in my question) that it is not the number of predictors, but the collinearity amongst them which affects the speed.
Ilya
Ilya 2015 年 12 月 2 日
編集済み: Ilya 2015 年 12 月 2 日
If you are willing to experiment a bit, try this. Find the cdescentCycle function inside lasso and replace line 799 (line numbers could be different in your version)
for j=find(active);
with these 3 lines:
a = find(active);
a = a(randperm(numel(a)));
for j=a
Does this help?

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeGet Started with Statistics and Machine Learning Toolbox についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by