regstats

Regression diagnostics

Syntax

regstats(y,X,model) stats = regstats(...) stats = regstats(y,X,model,whichstats)

Description

regstats(y,X,model) performs a multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses.

Note

By default, regstats adds a first column of 1s to X, corresponding to a constant term in the model. Do not enter a column of 1s directly into X.

The optional input model controls the regression model. By default, regstats uses a linear additive model with a constant term. model can be any one of the following:

'linear' — Constant and linear terms (the default)
'interaction' — Constant, linear, and interaction terms
'quadratic' — Constant, linear, interaction, and squared terms
'purequadratic' — Constant, linear, and squared terms

Alternatively, model can be a matrix of model terms accepted by the x2fx function. See x2fx for a description of this matrix and for a description of the order in which terms appear. You can use this matrix to specify other models including ones without a constant term.

With this syntax, the function displays a graphical user interface (GUI) with a list of diagnostic statistics, as shown in the following figure.

Regstats Export to Workspace window. The window displays a list of diagnostic statistics beside check the boxes. OK, Cancel, and Help buttons are located below the list.

When you select check boxes corresponding to the statistics you want to compute and click OK, regstats returns the selected statistics to the MATLAB^® workspace. The names of the workspace variables are displayed on the right-hand side of the interface. You can change the name of the workspace variable to any valid MATLAB variable name.

stats = regstats(...) creates the structure stats, whose fields contain all of the diagnostic statistics for the regression. This syntax does not open the GUI. The fields of stats are listed in the following table.

Field	Description
`Q`	Q from the QR decomposition of the design matrix
`R`	R from the QR decomposition of the design matrix
`beta`	Regression coefficients
`covb`	Covariance of regression coefficients
`yhat`	Fitted values of the response data
`r`	Residuals
`mse`	Mean squared error
`rsquare`	R² statistic
`adjrsquare`	Adjusted R² statistic
`leverage`	Leverage
`hatmat`	Hat matrix
`s2_i`	Delete-1 variance
`beta_i`	Delete-1 coefficients
`standres`	Standardized residuals
`studres`	Studentized residuals
`dfbetas`	Scaled change in regression coefficients
`dffit`	Change in fitted values
`dffits`	Scaled change in fitted values
`covratio`	Change in covariance
`cookd`	Cook's distance
`tstat`	t statistics and p-values for coefficients
`fstat`	F statistic and p-value
`dwstat`	Durbin-Watson statistic and p-value

Note that the fields names of stats correspond to the names of the variables returned to the MATLAB workspace when you use the GUI. For example, stats.beta corresponds to the variable beta that is returned when you select Coefficients in the GUI and click OK.

stats = regstats(y,X,model,whichstats) returns only the statistics that you specify in whichstats. whichstats can be a single character vector such as 'leverage', a string array such as ["leverage","standres","studres"], or a cell array of character vectors such as {'leverage','standres','studres'}. Set whichstats to 'all' to return all of the statistics.

Note

The F statistic is computed under the assumption that the model contains a constant term. It is not correct for models without a constant. The R² statistic can be negative for models without a constant, which indicates that the model is not appropriate for the data.

Examples

Open the regstats GUI using data from hald.mat:

load hald
regstats(heat,ingredients,'linear');

Select Fitted Values and Residuals in the GUI:

Close up of the regstats GUI. Check boxes are located next to input fields for fitted values and residuals.

Click OK to export the fitted values and residuals to the MATLAB workspace in variables named yhat and r, respectively.

You can create the same variables using the stats output, without opening the GUI:

whichstats = {'yhat','r'};
stats = regstats(heat,ingredients,'linear',whichstats);
yhat = stats.yhat;
r = stats.r;

Tips

regstats treats NaN values in X or y as missing values. regstats omits observations with missing values from the regression fit.

References

[1] Belsley, D. A., E. Kuh, and R. E. Welsch. Regression Diagnostics. Hoboken, NJ: John Wiley & Sons, Inc., 1980.

[2] Chatterjee, S., and A. S. Hadi. “Influential Observations, High Leverage Points, and Outliers in Linear Regression.” Statistical Science. Vol. 1, 1986, pp. 379–416.

[3] Cook, R. D., and S. Weisberg. Residuals and Influence in Regression. New York: Chapman & Hall/CRC Press, 1983.

[4] Goodall, C. R. “Computation Using the QR Decomposition.” Handbook in Statistics. Vol. 9, Amsterdam: Elsevier/North-Holland, 1993.

Version History

Introduced before R2006a