Subset of Regressors Approximation for GPR Models
The subset of regressors (SR) approximation method consists of replacing the kernel function in the exact GPR method by its approximation , given the active set . You can specify the SR method for parameter estimation by using the 'FitMethod','sr'
name-value pair argument in the call to fitrgp
. For prediction using SR, you can use the 'PredictMethod','sr'
name-value pair argument in the call to fitrgp
.
Approximating the Kernel Function
For the exact GPR model, the expected prediction in GPR depends on the set of functions , where is the set of indices of all observations, and n is the total number of observations. The idea is to approximate the span of these functions by a smaller set of functions, , where is the subset of indices of points selected to be in the active set. Consider . The aim is to approximate the elements of as linear combinations of the elements of .
Suppose the approximation to using the functions in is as follows:
where are the coefficients of the linear combination for approximating . Suppose is the matrix that contains all the coefficients . Then, , is a matrix such that . The software finds the best approximation to the elements of using the active set by minimizing the error function
where is the Reproducing Kernel Hilbert Spaces (RKHS) associated with the kernel function k [1], [2].
The coefficient matrix that minimizes is
and an approximation to the kernel function using the elements in the active set is
The SR approximation to the kernel function using the active set is defined as:
and the SR approximation to is:
Parameter Estimation
Replacing by in the marginal log likelihood function produces its SR approximation:
As in the exact method, the software estimates the parameters by first computing , the optimal estimate of , given and . Then it estimates , and using the -profiled marginal log likelihood. The SR estimate to for given , and is:
where
And the SR approximation to the -profiled marginal log likelihood is:
Prediction
The SR approximation to the distribution of given , , is
where and are the SR approximations to and shown in prediction using the exact GPR method.
and are obtained by replacing by its SR approximation in and , respectively.
That is,
Since
and from the fact that , can be written as
Similarly, is derived as follows:
Because
is found as follows:
Predictive Variance Problem
One of the disadvantages of the SR method is that it can give unreasonably small predictive variances when making predictions in a region far away from the chosen active set . Consider making a prediction at a new point that is far away from the training set . In other words, assume that .
For exact GPR, the posterior distribution of given , and would be Normal with mean and variance . This value is correct in the sense that, if is far from , then the data does not supply any new information about and so the posterior distribution of given , , and should reduce to the prior distribution given , which is a Normal distribution with mean and variance .
For the SR approximation, if is far away from (and hence also far away from ), then and . Thus in this extreme case, agrees with from exact GPR, but is unreasonably small compared to from exact GPR.
The fully independent conditional approximation method can help avoid this problem.
References
[1] Rasmussen, C. E. and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press. Cambridge, Massachusetts, 2006.
[2] Smola, A. J. and B. Schökopf. "Sparse greedy matrix approximation for machine learning." In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.