Feature Selection

What Is Feature Selection?

Feature selection is a dimensionality reduction technique that selects a subset of features (predictor variables) that provide the best predictive power in modeling a set of data.

Feature selection can be used to:

Prevent overfitting: avoid modeling with an excessive number of features that are more susceptible to rote-learning specific training examples
Reduce model size: increase computational performance with high-dimensional data or prepare model for embedded deployment where memory may be limited.
Improve interpretability: use fewer features, which may help identify those that affect model behavior

There are several common approaches to feature selection.

Iteratively change features set to optimize performance or loss

Stepwise regression sequentially adds or removes features until there is no improvement in prediction. It is used with linear regression or generalized linear regression algorithms. Similarly, sequential feature selection builds up a feature set until accuracy (or a custom performance measure) stops improving.

Rank features based on intrinsic characteristic

These methods estimate a ranking of the features, which in turn can be used to select the top few ranked features. Minimum redundance maximum relevance (MRMR) finds features that maximize mutual information between features and response variable and minimize mutual information between features themselves. Related methods rank features according to Laplacian scores or use a statistical test of whether a single feature is independent of response to determine feature importance.

Neighborhood Component Analysis (NCA) and ReliefF

These methods determine feature weights by maximizing the accuracy of prediction based on pairwise distance and penalizing predictors that lead to misclassification results.

Learn feature importance along with the model

Some supervised machine learning algorithms estimate feature importance during the training process. Those estimates can be used to rank features after the training is completed. Models with built-in feature selection include linear SVMs, boosted decision trees and their ensembles (random forests), and generalized linear models. Similarly, in lasso regularization a shrinkage estimator reduces the weights (coefficients) of redundant features to zero during training.

MATLAB^® supports the following feature selection methods:

Algorithm	Training	Types of Models	Accuracy	Caveats
NCA	Moderate	Better for distance-based models	High	Needs manual tuning of regularization lambda
MRMR	Fast	Any	High	Only for classification
ReliefF	Moderate	Better for distance-based models	Medium	Unable to differentiate correlated predictors
Sequential	Slow	Any	High	Doesn’t rank all features
F test	Fast	Any	Medium	For regression. Unable to differentiate correlated predictors.
Chi-square	Fast	Any	Medium	For classification. Unable to differentiate correlated predictors.

As an alternative to feature selection, feature transformation techniques transform existing features into new features (predictor variables) with the less descriptive features dropped. Feature transformation approaches include:

Principal component analysis (PCA), used to summarize data in fewer dimensions by projection onto a unique orthogonal basis
Factor analysis, used to build explanatory models of data correlations
Nonnegative matrix factorization, used when model terms must represent nonnegative values such as physical quantities

For more information on feature selection with MATLAB, including machine learning, regression, and transformation, see Statistics and Machine Learning Toolbox™ .

Key Points

Automated feature selection is a part of the complete AutoML workflow that delivers optimized models in a few simple steps.
Feature selection is an advanced technique to boost model performance (especially on high-dimensional data), improve interpretability, and reduce size.
Consider one of the models with “built-in” feature selection first. Otherwise MRMR works really well for classification.

Example

Feature selection can help select a reasonable subset from hundreds of features automatically generated by applying wavelet scattering. The figure below shows the ranking of the top 50 features obtained by applying the MATLAB function fscmrmr to automatically generated wavelet features from human activity sensor data.

Examples and How To

Feature Selection with Heart Sound Data - Example
Sequentially Selecting Features for Classifying High-Dimensional Data - Example
Ridge Regression, Lasso and Elastic Net - Blog
Visualizing High-Dimensional Data Using t-SNE - Example

Software Reference

fscmrmr: Feature selection using minimum redundancy maximum relevance algorithm - Documentation
fscnca: Feature selection using neighborhood component analysis for classification - Function
Lasso and Elastic Net - Documentation
fsulaplacian: Unsupervised feature ranking - Function
stepwiselm: Create linear regression model using stepwise regression - Function
Sequential Feature Selection - Documentation
Dimensionality Reduction and Feature Selection Functions - Documentation
fscchi2: Feature selection for classification using chi-squared tests - Function
fsrftest: Feature selection for regression using statistical F-tests - Function

Feature Engineering | Applied Machine Learning, Part 1

Getting Started with Machine Learning | Introduction to Machine Learning, Part 4

Mastering Machine Learning: A Step-by-Step Guide with MATLAB

Read ebook