Fully Independent Conditional Approximation for GPR Models

The fully independent conditional (FIC) approximation [1] is a way of systematically approximating the true GPR kernel function in a way that avoids the predictive variance problem of the SR approximation while still maintaining a valid Gaussian process. You can specify the FIC method for parameter estimation by using the 'FitMethod','fic' name-value pair argument in the call to fitrgp. For prediction using FIC, you can use the 'PredictMethod','fic' name-value pair argument in the call to fitrgp.

Approximating the Kernel Function

The FIC approximation to $k (x_{i}, x_{j} | θ)$ for active set $A \subset N = {1, 2, ..., n}$ is given by:

$\begin{array}{l} {\hat{k}}_{F I C} (x_{i}, x_{j} | θ, A) = {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A) + δ_{i j} (k (x_{i}, x_{j} | θ) - {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A)), \\ δ_{i j} = {\begin{array}{l} 1, & if i = j, \\ 0 & if i \neq j . \end{array} \end{array}$

That is, the FIC approximation is equal to the SR approximation if $i \neq j$ . For $i = j$ , the software uses the exact kernel value rather than an approximation. Define an n-by-n diagonal matrix $Ω (X | θ, A)$ as follows:

$\begin{array}{l} {[Ω (X | θ, A)]}_{i j} & = δ_{i j} (k (x_{i}, x_{j} | θ) - {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A)) \\ = {\begin{array}{l} k (x_{i}, x_{j} | θ) - {\hat{k}}_{S R} (x_{i}, x_{j} | θ, A) & if i = j, \\ 0 & if i \neq j . \end{array} \end{array}$

The FIC approximation to $K (X, X | θ)$ is then given by:

$\begin{array}{l} {\hat{K}}_{F I C} (X, X | θ, A) & = {\hat{K}}_{S R} (X, X | θ, A) + Ω (X | θ, A) \\ = K (X, X_{A} | θ) K {(X_{A}, X_{A} | θ)}^{- 1} K (X_{A}, X | θ) + Ω (X | θ, A) . \end{array}$

Parameter Estimation

Replacing $K (X, X | θ)$ by ${\hat{K}}_{F I C} (X, X | θ, A)$ in the marginal log likelihood function produces its FIC approximation:

$\begin{array}{l} \log P_{F I C} (y | X, β, θ, σ^{2}, A) = & - \frac{1}{2} {(y - H β)}^{T} {[{\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{n}]}^{- 1} (y - H β) \\ - \frac{N}{2} \log 2 π - \frac{1}{2} \log | {\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{n} | . \end{array}$

As in the exact method, the software estimates the parameters by first computing $\hat{β} (θ, σ^{2})$ , the optimal estimate of $β$ , given $θ$ and $σ^{2}$ . Then it estimates $θ$ , and $σ^{2}$ using the $β$ -profiled marginal log likelihood. The FIC estimate to $β$ for given $θ$ , and $σ^{2}$ is

${\hat{β}}_{F I C} (θ, σ^{2}, A) = {[\underset{*}{\underset{︸}{H^{T} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} H}}]}^{- 1} \underset{* *}{\underset{︸}{H^{T} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} y}},$

$\begin{array}{l} * = H^{T} Λ {(θ, σ^{2}, A)}^{- 1} H - H^{T} Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} H, \\ * * = H^{T} Λ {(θ, σ^{2}, A)}^{- 1} y - H^{T} Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} y, \\ B_{A} = K (X_{A}, X_{A} | θ) + K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ), \\ Λ (θ, σ^{2}, A) = Ω (X | θ, A) + σ^{2} I_{n} . \end{array}$

Using ${\hat{β}}_{F I C} (θ, σ^{2}, A)$ , the $β$ -profiled marginal log likelihood for FIC approximation is:

$\begin{array}{l} \log P_{F I C} (y | X, {\hat{β}}_{F I C} (θ, σ^{2}, A), θ, σ^{2}, A) = \\ \begin{array}{l} - \frac{1}{2} {(y - H {\hat{β}}_{F I C} (θ, σ^{2}, A))}^{T} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} (y - H {\hat{β}}_{F I C} (θ, σ^{2}, A)) \\ - \frac{N}{2} \log 2 π - \frac{1}{2} \log | {\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N} |, \end{array} \end{array}$

where

$\begin{array}{l} {({\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N})}^{- 1} \\ = Λ {(θ, σ^{2}, A)}^{- 1} - Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1}, \\ \log | {\hat{K}}_{F I C} (X, X | θ, A) + σ^{2} I_{N} | = \log | Λ (θ, σ^{2}, A) | + \log | B_{A} | - \log | K (X_{A}, X_{A} | θ) | . \end{array}$

Prediction

The FIC approximation to the distribution of $y_{n e w}$ given $y$ , $X$ , $x_{n e w}$ is

$\begin{array}{l} P (y_{n e w} | y, X, x_{n e w}) & = N (y_{n e w} | h {(x_{n e w})}^{T} β + μ_{F I C}, σ_{n e w}^{2} + Σ_{F I C}) \end{array},$

where $μ_{F I C}$ and $Σ_{F I C}$ are the FIC approximations to $μ$ and $Σ$ given in prediction using exact GPR method. As in the SR case, $μ_{F I C}$ and $Σ_{F I C}$ are obtained by replacing all occurrences of the true kernel with its FIC approximation. The final forms of $μ_{F I C}$ and $Σ_{F I C}$ are as follows:

$μ_{F I C} = K (x_{n e w}^{T}, X_{A} | θ) B_{A}^{- 1} K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} (y - H β),$

$\begin{array}{l} Σ_{F I C} & = k (x_{n e w}, x_{n e w} | θ) - K (x_{n e w}^{T}, X_{A} | θ) K {(X_{A}, X_{A} | θ)}^{- 1} K (X_{A}, x_{n e w}^{T} | θ) \\ + K (x_{n e w}^{T}, X_{A} | θ) B_{A}^{- 1} K (X_{A}, x_{n e w}^{T} | θ), \end{array}$

where

$\begin{array}{l} B_{A} = K (X_{A}, X_{A} | θ) + K (X_{A}, X | θ) Λ {(θ, σ^{2}, A)}^{- 1} K (X, X_{A} | θ), \\ Λ (θ, σ^{2}, A) = Ω (X | θ, A) + σ^{2} I_{n} . \end{array}$

References

[1] Candela, J. Q. "A Unifying View of Sparse Approximate Gaussian Process Regression." Journal of Machine Learning Research. Vol 6, pp. 1939–1959, 2005.