## Specify ARIMA Error Model Innovation Distribution

A regression model with ARIMA errors has the following general form:

 $\begin{array}{c}{y}_{t}=c+{X}_{t}\beta +{u}_{t}\\ a\left(L\right)A\left(L\right){\left(1-L\right)}^{D}\left(1-{L}^{s}\right){u}_{t}=b\left(L\right)B\left(L\right){\epsilon }_{t},\end{array}$ (1)
where

• t = 1,...,T.

• yt is the response series.

• Xt is row t of X, which is the matrix of concatenated predictor data vectors. That is, Xt is observation t of each predictor series.

• c is the regression model intercept.

• β is the regression coefficient.

• ut is the disturbance series.

• εt is the innovations series.

• ${L}^{j}{y}_{t}={y}_{t-j}.$

• $a\left(L\right)=\left(1-{a}_{1}L-...-{a}_{p}{L}^{p}\right),$ which is the degree p, nonseasonal autoregressive polynomial.

• $A\left(L\right)=\left(1-{A}_{1}L-...-{A}_{{p}_{s}}{L}^{{p}_{s}}\right),$ which is the degree ps, seasonal autoregressive polynomial.

• ${\left(1-L\right)}^{D},$ which is the degree D, nonseasonal integration polynomial.

• $\left(1-{L}^{s}\right),$ which is the degree s, seasonal integration polynomial.

• $b\left(L\right)=\left(1+{b}_{1}L+...+{b}_{q}{L}^{q}\right),$ which is the degree q, nonseasonal moving average polynomial.

• $B\left(L\right)=\left(1+{B}_{1}L+...+{B}_{{q}_{s}}{L}^{{q}_{s}}\right),$ which is the degree qs, seasonal moving average polynomial.

Suppose that the unconditional disturbance series (ut) is a stationary stochastic processes. Then, you can express the second equation in Equation 1 as

`${u}_{t}={a}^{-1}\left(L\right){A}^{-1}\left(L\right){\left(1-L\right)}^{-D}{\left(1-{L}^{s}\right)}^{-1}b\left(L\right)B\left(L\right){\epsilon }_{t}=\Psi \left(L\right){\epsilon }_{t},$`

where Ψ(L) is an infinite degree lag operator polynomial [2].

The innovation process (εt) is an independent and identically distributed (iid), mean 0 process with a known distribution. Econometrics Toolbox™ generalizes the innovation process to εt = σzt, where zt is a series of iid random variables with mean 0 and variance 1, and σ2 is the constant variance of εt.

`regARIMA` models contain two properties that describe the distribution of εt:

• `Variance` stores σ2.

• `Distribution` stores the parametric form of zt.

### Innovation Distribution Options

• The default value of `Variance` is `NaN`, meaning that the innovation variance is unknown. You can assign a positive scalar to `Variance` when you specify the model using the name-value pair argument `'Variance',sigma2` (where `sigma2` = σ2), or by modifying an existing model using dot notation. Alternatively, you can estimate `Variance` using `estimate`.

• You can specify the following distributions for zt (using name-value pair arguments or dot notation):

• Standard Gaussian

• Standardized Student’s t with degrees of freedom ν > 2. Specifically,

`${z}_{t}={T}_{\nu }\sqrt{\frac{\nu -2}{\nu }},$`

where Tν is a Student’s t distribution with degrees of freedom ν > 2.

The t distribution is useful for modeling innovations that are more extreme than expected under a Gaussian distribution. Such innovation processes have excess kurtosis, a more peaked (or heavier tailed) distribution than a Gaussian. Note that for ν > 4, the kurtosis (fourth central moment) of Tν is the same as the kurtosis of the Standardized Student’s t (zt), i.e., for a t random variable, the kurtosis is scale invariant.

Tip

It is good practice to assess the distributional properties of the residuals to determine if a Gaussian innovation distribution (the default distribution) is appropriate for your model.

### Specify Innovation Distribution

`regARIMA` stores the distribution (and degrees of freedom for the t distribution) in the `Distribution` property. The data type of `Distribution` is a `struct` array with potentially two fields: `Name` and `DoF`.

• If the innovations are Gaussian, then the `Name` field is `Gaussian`, and there is no `DoF` field. `regARIMA` sets `Distribution` to `Gaussian` by default.

• If the innovations are t-distributed, then the `Name` field is `t` and the `DoF` field is `NaN` by default, or you can specify a scalar that is greater than 2.

To illustrate specifying the distribution, consider this regression model with AR(2) errors:

`$\begin{array}{rcl}{y}_{t}& =& c+{X}_{t}\beta +{u}_{t}\\ {u}_{t}& =& {\alpha }_{1}{u}_{t-1}+{\alpha }_{2}{u}_{t-2}+{\epsilon }_{t}\end{array}$`

```Mdl = regARIMA(2,0,0); Mdl.Distribution```
```ans = struct with fields: Name: "Gaussian" ```

By default, `Distribution` property of `Mdl` is a `struct` array with the field `Name` having the value `Gaussian`.

If you want to specify a t innovation distribution, then you can either specify the model using the name-value pair argument `'Distribution','t'`, or use dot notation to modify an existing model.

Specify the model using the name-value pair argument.

```Mdl = regARIMA('ARLags',1:2,'Distribution','t'); Mdl.Distribution```
```ans = struct with fields: Name: "t" DoF: NaN ```

If you use the name-value pair argument to specify the t innovation distribution, then the default degrees of freedom is `NaN`.

You can use dot notation to yield the same result.

```Mdl = regARIMA(2,0,0); Mdl.Distribution = 't'```
```Mdl = regARIMA with properties: Description: "ARMA(2,0) Error Model (t Distribution)" Distribution: Name = "t", DoF = NaN Intercept: NaN Beta: [1×0] P: 2 Q: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Variance: NaN ```

If the innovation distribution is ${t}_{10}$, then you can use dot notation to modify the `Distribution` property of the existing model `Mdl`. You cannot modify the fields of `Distribution` using dot notation, e.g., `Mdl.Distribution.DoF = 10` is not a value assignment. However, you can display the value of the fields using dot notation.

`Mdl.Distribution = struct('Name','t','DoF',10)`
```Mdl = regARIMA with properties: Description: "ARMA(2,0) Error Model (t Distribution)" Distribution: Name = "t", DoF = 10 Intercept: NaN Beta: [1×0] P: 2 Q: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Variance: NaN ```
`tDistributionDoF = Mdl.Distribution.DoF`
```tDistributionDoF = 10 ```

Since the `DoF` field is not a `NaN`, it is an equality constraint when you estimate `Mdl` using `estimate`.

Alternatively, you can specify the ${t}_{10}$ innovation distribution using the name-value pair argument.

```Mdl = regARIMA('ARLags',1:2,'Intercept',0,... 'Distribution',struct('Name','t','DoF',10))```
```Mdl = regARIMA with properties: Description: "ARMA(2,0) Error Model (t Distribution)" Distribution: Name = "t", DoF = 10 Intercept: 0 Beta: [1×0] P: 2 Q: 0 AR: {NaN NaN} at lags [1 2] SAR: {} MA: {} SMA: {} Variance: NaN ```

## References

[1] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.

[2] Wold, H. A Study in the Analysis of Stationary Time Series. Uppsala, Sweden: Almqvist & Wiksell, 1938.