Predictive modeling uses mathematical and computational methods to predict an event or outcome. A mathematical approach uses an equation-based model that describes the phenomenon under consideration. The model is used to forecast an outcome at some future state or time based upon changes to the model inputs. The model parameters help explain how model inputs influence the outcome. Examples include time-series regression models for predicting airline traffic volume or predicting fuel efficiency based on a linear regression model of engine speed versus load.
The computational predictive modeling approach differs from the mathematical approach because it relies on models that are not easy to explain in equation form and often require simulation techniques to create a prediction. This approach is often called “black box” predictive modeling because the model structure does not provide insight into the factors that map model input to outcome. Examples include using neural networks to predict which winery a glass of wine originated from or bagged decision trees for predicting the credit rating of a borrower.
Predictive modeling is often performed using curve and surface fitting, time series regression, or machine learning approaches. Regardless of the approach used, the process of creating a predictive model is the same across methods. The steps are:
- Clean the data by removing outliers and treating missing data
- Identify a parametric on nonparametric predictive modeling approach to use
- Preprocess the data into a form suitable for the chosen modeling algorithm
- Specify a subset of the data to be used for training the model
- Train, or estimate, model parameters from the training data set
- Conduct model performance or goodness-of-fit tests to check model adequacy
- Validate predictive modeling accuracy on data not used for calibrating the model
- Use the model for prediction if satisfied with its performance