oobQuantileError

Out-of-bag quantile loss of bag of regression trees

Syntax

err = oobQuantileError(Mdl)

err = oobQuantileError(Mdl,Name,Value)

Description

err = oobQuantileError(Mdl) returns half of the out-of-bag mean absolute deviation (MAD) from comparing the true responses in Mdl.Y to the predicted, out-of-bag medians at Mdl.X, the predictor data, and using the bag of regression trees Mdl. Mdl must be a TreeBagger model object.

example

err = oobQuantileError(Mdl,Name,Value) uses additional options specified by one or more Name,Value pair arguments. For example, specify quantile probabilities, the error type, or which trees to include in the quantile-regression-error estimation.

example

Input Arguments

expand all

`Mdl` — Bag of regression trees
`TreeBagger` model object (default)

Bag of regression trees, specified as a TreeBagger model object created by the TreeBagger function.

The value of Mdl.Method must be regression.
When you train Mdl using the TreeBagger function, you must specify the name-value pair 'OOBPrediction','on'. Consequently, TreeBagger saves required out-of-bag observation index matrix in Mdl.OOBIndices.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

`Mode` — Ensemble error type
`'ensemble'` (default) | `'cumulative'` | `'individual'`

Ensemble error type, specified as the comma-separated pair consisting of 'Mode' and a value in this table. Suppose tau is the value of Quantile.

Value	Description
`'cumulative'`	`err` is a `Mdl.NumTrees`-by-`numel(tau)` numeric matrix of cumulative quantile regression errors. `err(j,k)` is the `tau(k)` quantile regression error using the learners in `Mdl.Trees(1:j)` only.
`'ensemble'`	`err` is a 1-by-`numel(tau)` numeric vector of cumulative quantile regression errors for the entire ensemble. `err(k)` is the `tau(k)` ensemble quantile regression error.
`'individual'`	`err` is a `Mdl.NumTrees`-by-`numel(tau)` numeric matrix of quantile regression errors from individual learners. `err(j,k)` is the `tau(k)` quantile regression error using the learner in `Mdl.Trees(j)` only.

For 'cumulative' and 'individual', if you choose to include fewer trees in quantile estimation using Trees, then this action affects the number of rows in err and corresponding row indices.

Example: 'Mode','cumulative'

`Quantile` — Quantile probability
`0.5` (default) | numeric vector containing values in [0,1]

Quantile probability, specified as the comma-separated pair consisting of 'Quantile' and a numeric vector containing values in the interval [0,1]. For each observation (row) in Mdl.X, oobQuantileError estimates corresponding quantiles for all probabilities in Quantile.

Example: 'Quantile',[0 0.25 0.5 0.75 1]

Data Types: single | double

`Trees` — Indices of trees to use in response estimation
`'all'` (default) | numeric vector of positive integers

Indices of trees to use in response estimation, specified as the comma-separated pair consisting of 'Trees' and 'all' or a numeric vector of positive integers. Indices correspond to the cells of Mdl.Trees; each cell therein contains a tree in the ensemble. The maximum value of Trees must be less than or equal to the number of trees in the ensemble (Mdl.NumTrees).

For 'all', oobQuantileError uses all trees in the ensemble (that is, the indices 1:Mdl.NumTrees).

Values other than the default can affect the number of rows in err.

Example: 'Trees',[1 10 Mdl.NumTrees]

Data Types: char | string | single | double

`TreeWeights` — Weights to attribute to responses from individual trees
`ones(Mdl.NumTrees,1)` (default) | numeric vector of nonnegative values

Weights to attribute to responses from individual trees, specified as the comma-separated pair consisting of 'TreeWeights' and a numeric vector of numel(trees) nonnegative values. trees is the value of Trees.

If you specify 'Mode','individual', then oobQuantileError ignores TreeWeights.

Data Types: single | double

Output Arguments

expand all

`err` — Half of out-of-bag quantile regression error
numeric scalar | numeric matrix

Half of the out-of-bag quantile regression error, returned as a numeric scalar or T-by-numel(tau) matrix. tau is the value of Quantile.

T depends on the values of Mode, Trees, and Quantile. Suppose that you specify 'Quantile',tau and 'Trees',trees.

For 'Mode','cumulative', err is a numel(trees)-by-numel(tau) numeric matrix. err(j,k) is the tau(k) cumulative, out-of-bag quantile regression error using the learners in Mdl.Trees(trees(1:j)).
For 'Mode','ensemble', err is a 1-by-numel(tau) numeric vector. err(k) is the tau(k) cumulative, out-of-bag quantile regression error using the learners in Mdl.Trees(trees).
For 'Mode','individual', err is a numel(trees)-by-numel(tau) numeric matrix. err(j,k) is the tau(k) out-of-bag quantile regression error using the learner in Mdl.Trees(trees(j)).

Examples

expand all

Estimate Out-of-Bag Quantile Regression Error

Open Live Script

Load the carsmall data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders. Consider Cylinders a categorical variable.

load carsmall
Cylinders = categorical(Cylinders);
X = table(Displacement,Weight,Cylinders,MPG);

Train an ensemble of bagged regression trees using the entire data set. Specify 100 weak learners and save the out-of-bag indices.

rng(1); % For reproducibility
Mdl = TreeBagger(100,X,'MPG','Method','regression','OOBPrediction','on');

Mdl is a TreeBagger ensemble.

Perform quantile regression, and out-of-bag estimate the MAD of the entire ensemble using the predicted conditional medians.

oobErr = oobQuantileError(Mdl)

oobErr = 
1.5349

oobErr is an unbiased estimate of the quantile regression error for the entire ensemble.

Find Appropriate Ensemble Size Using Out-of-Bag Quantile Regression Error

Open Live Script

Load the carsmall data set. Consider a model that predicts the fuel economy of a car given its engine displacement, weight, and number of cylinders.

load carsmall
X = table(Displacement,Weight,Cylinders,MPG);

Train an ensemble of bagged regression trees using the entire data set. Specify 250 weak learners and save the out-of-bag indices.

rng('default'); % For reproducibility
Mdl = TreeBagger(250,X,'MPG','Method','regression',...
    'OOBPrediction','on');

Estimate the cumulative; out-of-bag; 0.25, 0.5, and 0.75 quantile regression errors.

err = oobQuantileError(Mdl,'Quantile',[0.25 0.5 0.75],'Mode','cumulative');

err is an 250-by-3 matrix of cumulative, out-of-bag, quantile regression errors. Columns correspond to quantile probabilities and rows correspond to trees in the ensemble. The errors are cumulative, so they incorporate aggregated predictions from previous trees.

Plot the cumulative, out-of-bag, quantile errors on the same plot.

figure;
plot(err);
legend('0.25 quantile error','0.5 quantile error','0.75 quantile error');
ylabel('Out-of-bag quantile error');
xlabel('Tree index');
title('Cumulative, Out-of-Bag, Quantile Regression Error')

Figure contains an axes object. The axes object with title Cumulative, Out-of-Bag, Quantile Regression Error, xlabel Tree index, ylabel Out-of-bag quantile error contains 3 objects of type line. These objects represent 0.25 quantile error, 0.5 quantile error, 0.75 quantile error.

All quantile error curves appear to level off after training about 50 trees. So, training 50 trees appears to be sufficient to achieve minimal quantile error for the three quantile probabilities.

More About

expand all

Out-of-Bag

In a bagged ensemble, observations are out-of-bag when they are left out of the training sample for a particular learner. Observations are in-bag when they are used to train a particular learner.

When bagging learners, a practitioner takes a bootstrap sample (that is, a random sample with replacement) of size n for each learner, and then trains the learners using their respective bootstrap samples. Drawing n out of n observations with replacement omits on average about 37% of observations for each learner.

The out-of-bag ensemble error, the ensemble error estimated using out-of-bag observations only, is an unbiased estimator of the true ensemble error.

Quantile Regression Error

The quantile regression error of a model given observed predictor data and responses is the weighted mean absolute deviation (MAD). If the model under-predicts the response, then deviation weights are τ, the quantile probability. If the model over-predicts, then deviation weights are 1 – τ.

That is, the τ quantile regression error is

$L_{τ} = τ \frac{\sum_{{j : y_{j} \geq {\hat{y}}_{τ, j}}} w_{j} (y_{j} - {\hat{y}}_{τ, j})}{\sum_{j = 1}^{n} w_{j}} + (1 - τ) \frac{\sum_{{j : y_{j} < {\hat{y}}_{τ, j}}} w_{j} ({\hat{y}}_{τ, j} - y_{j})}{\sum_{j = 1}^{n} w_{j}} .$

y_j is true response j, ${\hat{y}}_{τ, j}$ is the τ quantile that the model predicts, and w_j is observation weight j.

Tips

The out-of-bag ensemble error estimator is unbiased for the true ensemble error. So, to tune parameters of a random forest, estimate the out-of-bag ensemble error instead of implementing cross-validation.

References

[1] Breiman, L. "Random Forests." Machine Learning 45, pp. 5–32, 2001.

[2] Meinshausen, N. “Quantile Regression Forests.” Journal of Machine Learning Research, Vol. 7, 2006, pp. 983–999.

Version History

Introduced in R2016b

oobQuantileError

Syntax

Description

Input Arguments

`Mdl` — Bag of regression trees
`TreeBagger` model object (default)

Name-Value Arguments

`Mode` — Ensemble error type
`'ensemble'` (default) | `'cumulative'` | `'individual'`

`Quantile` — Quantile probability
`0.5` (default) | numeric vector containing values in [0,1]

`Trees` — Indices of trees to use in response estimation
`'all'` (default) | numeric vector of positive integers

`TreeWeights` — Weights to attribute to responses from individual trees
`ones(Mdl.NumTrees,1)` (default) | numeric vector of nonnegative values

Output Arguments

`err` — Half of out-of-bag quantile regression error
numeric scalar | numeric matrix

Examples

Estimate Out-of-Bag Quantile Regression Error

Find Appropriate Ensemble Size Using Out-of-Bag Quantile Regression Error

More About

Out-of-Bag

Quantile Regression Error

Tips

References

Version History

See Also

Topics

oobQuantileError

Syntax

Description

Input Arguments

Mdl — Bag of regression trees TreeBagger model object (default)

Name-Value Arguments

Mode — Ensemble error type 'ensemble' (default) | 'cumulative' | 'individual'

Quantile — Quantile probability 0.5 (default) | numeric vector containing values in [0,1]

Trees — Indices of trees to use in response estimation 'all' (default) | numeric vector of positive integers

TreeWeights — Weights to attribute to responses from individual trees ones(Mdl.NumTrees,1) (default) | numeric vector of nonnegative values

Output Arguments

err — Half of out-of-bag quantile regression error numeric scalar | numeric matrix

Examples

Estimate Out-of-Bag Quantile Regression Error

Find Appropriate Ensemble Size Using Out-of-Bag Quantile Regression Error

More About

Out-of-Bag

Quantile Regression Error

Tips

References

Version History

See Also

Topics

`Mdl` — Bag of regression trees
`TreeBagger` model object (default)

`Mode` — Ensemble error type
`'ensemble'` (default) | `'cumulative'` | `'individual'`

`Quantile` — Quantile probability
`0.5` (default) | numeric vector containing values in [0,1]

`Trees` — Indices of trees to use in response estimation
`'all'` (default) | numeric vector of positive integers

`TreeWeights` — Weights to attribute to responses from individual trees
`ones(Mdl.NumTrees,1)` (default) | numeric vector of nonnegative values

`err` — Half of out-of-bag quantile regression error
numeric scalar | numeric matrix