kfoldLoss

Regression loss for cross-validated kernel regression model

Syntax

L = kfoldLoss(CVMdl)

L = kfoldLoss(CVMdl,Name,Value)

Description

L = kfoldLoss(CVMdl) returns the regression loss obtained by the cross-validated kernel regression model CVMdl. For every fold, kfoldLoss computes the regression loss for observations in the validation fold, using a model trained on observations in the training fold.

L = kfoldLoss(CVMdl,Name,Value) returns the mean squared error (MSE) with additional options specified by one or more name-value arguments. For example, you can specify the regression-loss function or which folds to use for loss calculation.

Examples

collapse all

Compute Loss for Cross-Validated Kernel Regression Models

Open Live Script

Simulate sample data:

rng(0,'twister'); % For reproducibility
n = 1000;
x = linspace(-10,10,n)';
y = 1 + x*2e-2 + sin(x)./x + 0.2*randn(n,1);

Cross-validate a kernel regression model.

CVMdl = fitrkernel(x,y,'Kfold',5);

fitrkernel implements 5-fold cross-validation. CVMdl is a RegressionPartitionedKernel model. It contains the property Trained, which is a 5-by-1 cell array holding 5 RegressionKernel models that the software trained using the training set.

Compute the epsilon-insensitive loss for each fold for observations that fitrkernel did not use in training the folds.

L = kfoldLoss(CVMdl,'LossFun','epsiloninsensitive','Mode','individual')

Input Arguments

collapse all

`CVMdl` — Cross-validated kernel regression model
`RegressionPartitionedKernel` model object

Cross-validated kernel regression model, specified as a RegressionPartitionedKernel model object. You can create a RegressionPartitionedKernel model using fitrkernel and specifying any of the cross-validation name-value pair arguments, for example, CrossVal.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'LossFun','epsiloninsensitive','Mode','individual' specifies kfoldLoss to return the epsilon-insensitive loss for each fold.

`Folds` — Fold indices to use for response prediction
`1:CVMdl.KFold` (default) | numeric vector of positive integers

Fold indices to use for response prediction, specified as the comma-separated pair consisting of 'Folds' and a numeric vector of positive integers. The elements of Folds must range from 1 through CVMdl.KFold.

Example: 'Folds',[1 4 10]

Data Types: single | double

`LossFun` — Loss function
`'mse'` (default) | `'epsiloninsensitive'` | function handle

Loss function, specified as the comma-separated pair consisting of 'LossFun' and a built-in loss function name or function handle.

The following table lists the available loss functions. Specify one using its corresponding character vector or string scalar. Also, in the table, $f (x) = x β + b .$
- β is a vector of p coefficients.
- x is an observation from p predictor variables.
- b is the scalar bias.
Value Description
'epsiloninsensitive' Epsilon-insensitive loss: $ℓ [y, f (x)] = \max [0, | y - f (x) | - ε]$
'mse' MSE: $ℓ [y, f (x)] = {[y - f (x)]}^{2}$
'epsiloninsensitive' is appropriate for SVM learners only.
Specify your own function using function handle notation.
Assume that n is the number of observations in X. Your function must have this signature
```
lossvalue = lossfun(Y,Yhat,W)
```
where:
- The output argument lossvalue is a scalar.
- You specify the function name (lossfun).
- Y is an n-dimensional vector of observed responses. kfoldLoss passes the input argument Y in for Y.
- Yhat is an n-dimensional vector of predicted responses, which is similar to the output of predict.
- W is an n-by-1 numeric vector of observation weights.

Value	Description
`'epsiloninsensitive'`	Epsilon-insensitive loss: $ℓ [y, f (x)] = \max [0, \| y - f (x) \| - ε]$
`'mse'`	MSE: $ℓ [y, f (x)] = {[y - f (x)]}^{2}$

Data Types: char | string | function_handle

`Mode` — Loss aggregation level
`'average'` (default) | `'individual'`

Loss aggregation level, specified as the comma-separated pair consisting of 'Mode' and 'average' or 'individual'.

Value	Description
`'average'`	Returns losses averaged over all folds
`'individual'`	Returns losses for each fold

Example: 'Mode','individual'

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | `"omitted"` | numeric scalar

Since R2023b

Predicted response value to use for observations with missing predictor values, specified as "median", "mean", "omitted", or a numeric scalar.

Value	Description
`"median"`	`kfoldLoss` uses the median of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
`"mean"`	`kfoldLoss` uses the mean of the observed response values in the training-fold data as the predicted response value for observations with missing predictor values.
`"omitted"`	`kfoldLoss` excludes observations with missing predictor values from the loss computation.
Numeric scalar	`kfoldLoss` uses this value as the predicted response value for observations with missing predictor values.

If an observation is missing an observed response value or an observation weight, then kfoldLoss does not use the observation in the loss computation.

Example: "PredictionForMissingValue","omitted"

Data Types: single | double | char | string

Output Arguments

collapse all

`L` — Cross-validated regression losses
numeric scalar | numeric vector

Cross-validated regression losses, returned as a numeric scalar or vector. The interpretation of L depends on LossFun.

If Mode is 'average', then L is a scalar.
Otherwise, L is a k-by-1 vector, where k is the number of folds. L(j) is the average regression loss over fold j.

To estimate L, kfoldLoss uses the data that created CVMdl.

Version History

Introduced in R2018b

expand all

R2023b: Specify predicted response value to use for observations with missing predictor values

Starting in R2023b, when you predict or compute the loss, some regression models allow you to specify the predicted response value for observations with missing predictor values. Specify the PredictionForMissingValue name-value argument to use a numeric scalar, the training set median, or the training set mean as the predicted value. When computing the loss, you can also specify to omit observations with missing predictor values.

This table lists the object functions that support the PredictionForMissingValue name-value argument. By default, the functions use the training set median as the predicted response value for observations with missing predictor values.

Model Type	Model Objects	Object Functions
Gaussian process regression (GPR) model	`RegressionGP`, `CompactRegressionGP`	`loss`, `predict`, `resubLoss`, `resubPredict`
Gaussian process regression (GPR) model	`RegressionPartitionedGP`	`kfoldLoss`, `kfoldPredict`
Gaussian kernel regression model	`RegressionKernel`	`loss`, `predict`
Gaussian kernel regression model	`RegressionPartitionedKernel`	`kfoldLoss`, `kfoldPredict`
Linear regression model	`RegressionLinear`	`loss`, `predict`
Linear regression model	`RegressionPartitionedLinear`	`kfoldLoss`, `kfoldPredict`
Neural network regression model	`RegressionNeuralNetwork`, `CompactRegressionNeuralNetwork`	`loss`, `predict`, `resubLoss`, `resubPredict`
Neural network regression model	`RegressionPartitionedNeuralNetwork`	`kfoldLoss`, `kfoldPredict`
Support vector machine (SVM) regression model	`RegressionSVM`, `CompactRegressionSVM`	`loss`, `predict`, `resubLoss`, `resubPredict`
Support vector machine (SVM) regression model	`RegressionPartitionedSVM`	`kfoldLoss`, `kfoldPredict`

In previous releases, the regression model loss and predict functions listed above used NaN predicted response values for observations with missing predictor values. The software omitted observations with missing predictor values from the resubstitution ("resub") and cross-validation ("kfold") computations for prediction and loss.

kfoldLoss

Syntax

Description

Examples

Compute Loss for Cross-Validated Kernel Regression Models

Input Arguments

CVMdl — Cross-validated kernel regression model RegressionPartitionedKernel model object

Name-Value Arguments

Folds — Fold indices to use for response prediction 1:CVMdl.KFold (default) | numeric vector of positive integers

LossFun — Loss function 'mse' (default) | 'epsiloninsensitive' | function handle

Mode — Loss aggregation level 'average' (default) | 'individual'

PredictionForMissingValue — Predicted response value to use for observations with missing predictor values "median" (default) | "mean" | "omitted" | numeric scalar

Output Arguments

L — Cross-validated regression losses numeric scalar | numeric vector

Version History

R2023b: Specify predicted response value to use for observations with missing predictor values

See Also

`CVMdl` — Cross-validated kernel regression model
`RegressionPartitionedKernel` model object

`Folds` — Fold indices to use for response prediction
`1:CVMdl.KFold` (default) | numeric vector of positive integers

`LossFun` — Loss function
`'mse'` (default) | `'epsiloninsensitive'` | function handle

`Mode` — Loss aggregation level
`'average'` (default) | `'individual'`

`PredictionForMissingValue` — Predicted response value to use for observations with missing predictor values
`"median"` (default) | `"mean"` | `"omitted"` | numeric scalar

`L` — Cross-validated regression losses
numeric scalar | numeric vector