selectModels

Select fitted regularized linear regression models

Syntax

SubMdl = selectModels(Mdl,idx)

Description

SubMdl = selectModels(Mdl,idx) returns a subset of trained linear regression models from a set of linear regression models (Mdl) trained using various regularization strengths. The indices idx correspond to the regularization strengths in Mdl.Lambda, and specify which models to return.

example

Input Arguments

expand all

`Mdl` — Linear regression models trained using various regularization strengths
`RegressionLinear` model object

Linear regression models trained using various regularization strengths, specified as a RegressionLinear model object. You can create a RegressionLinear model object using fitrlinear.

Although Mdl is one model object, if numel(Mdl.Lambda) = L ≥ 2, then you can think of Mdl as L trained models.

`idx` — Indices corresponding to regularization strengths
numeric vector of positive integers

Indices corresponding to regularization strengths, specified as a numeric vector of positive integers. Values of idx must be in the interval [1,L], where L = numel(Mdl.Lambda).

Data Types: double | single

Output Arguments

expand all

`SubMdl` — Subset of linear regression models trained using various regularization strengths
`RegressionLinear` model object

Subset of linear regression models trained using various regularization strengths, returned as a RegressionLinear model object.

Examples

expand all

Find Good Lasso Penalty Using Regression Loss

Open Live Script

Simulate 10000 observations from this model

$y = x_{100} + 2 x_{200} + e .$

$X = {x_{1}, . . ., x_{1000}}$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.

rng(1) % For reproducibility
n = 1e4;
d = 1e3;
nz = 0.1;
X = sprandn(n,d,nz);
Y = X(:,100) + 2*X(:,200) + 0.3*randn(n,1);

Create a set of 15 logarithmically-spaced regularization strengths from $1 0^{- 4}$ through $1 0^{- 1}$ .

Lambda = logspace(-4,-1,15);

Hold out 30% of the data for testing. Identify the test-sample indices.

cvp = cvpartition(numel(Y),'Holdout',0.30);
idxTest = test(cvp);

Train a linear regression model using lasso penalties with the strengths in Lambda. Specify the regularization strengths, optimizing the objective function using SpaRSA, and the data partition. To increase execution speed, transpose the predictor data and specify that the observations are in columns.

X = X'; 
CVMdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,...
    'Solver','sparsa','Regularization','lasso','CVPartition',cvp);
Mdl1 = CVMdl.Trained{1};
numel(Mdl1.Lambda)

ans = 
15

Mdl1 is a RegressionLinear model. Because Lambda is a 15-dimensional vector of regularization strengths, you can think of Mdl1 as 15 trained models, one for each regularization strength.

Estimate the test-sample mean squared error for each regularized model.

mse = loss(Mdl1,X(:,idxTest),Y(idxTest),'ObservationsIn','columns');

Higher values of Lambda lead to predictor variable sparsity, which is a good quality of a regression model. Retrain the model using the entire data set and all options used previously, except the data-partition specification. Determine the number of nonzero coefficients per model.

Mdl = fitrlinear(X,Y,'ObservationsIn','columns','Lambda',Lambda,...
    'Solver','sparsa','Regularization','lasso');
numNZCoeff = sum(Mdl.Beta~=0);

In the same figure, plot the MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

figure;
[h,hL1,hL2] = plotyy(log10(Lambda),log10(mse),...
    log10(Lambda),log10(numNZCoeff)); 
hL1.Marker = 'o';
hL2.Marker = 'o';
ylabel(h(1),'log_{10} MSE')
ylabel(h(2),'log_{10} nonzero-coefficient frequency')
xlabel('log_{10} Lambda')
hold off

$Figure contains 2 axes objects. Axes object 1 with xlabel log_{10} Lambda, ylabel log_{10} MSE contains an object of type line. Axes object 2 with ylabel log_{10} nonzero-coefficient frequency contains an object of type line.$

Select the index or indices of Lambda that balance minimal classification error and predictor-variable sparsity (for example, Lambda(11)).

idx = 11;
MdlFinal = selectModels(Mdl,idx);

MdlFinal is a trained RegressionLinear model object that uses Lambda(11) as a regularization strength.

Tips

One way to build several predictive linear regression models is:

Hold out a portion of the data for testing.
Train a linear regression model using fitrlinear. Specify a grid of regularization strengths using the 'Lambda' name-value pair argument and supply the training data. fitrlinear returns one RegressionLinear model object, but it contains a model for each regularization strength.
To determine the quality of each regularized model, pass the returned model object and the held-out data to, for example, loss.
Identify the indices (idx) of a satisfactory subset of regularized models, and then pass the returned model and the indices to selectModels. selectModels returns one RegressionLinear model object, but it contains numel(idx) regularized models.
To predict class labels for new data, pass the data and the subset of regularized models to predict.

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2016a

expand all

R2024a: Specify GPU arrays (requires Parallel Computing Toolbox)

selectModels fully supports GPU arrays.

selectModels

Syntax

Description

Input Arguments

Mdl — Linear regression models trained using various regularization strengths RegressionLinear model object

idx — Indices corresponding to regularization strengths numeric vector of positive integers

Output Arguments

SubMdl — Subset of linear regression models trained using various regularization strengths RegressionLinear model object

Examples

Find Good Lasso Penalty Using Regression Loss

Tips

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024a: Specify GPU arrays (requires Parallel Computing Toolbox)

See Also

`Mdl` — Linear regression models trained using various regularization strengths
`RegressionLinear` model object

`idx` — Indices corresponding to regularization strengths
numeric vector of positive integers

`SubMdl` — Subset of linear regression models trained using various regularization strengths
`RegressionLinear` model object

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.