Feature Extraction

Mel spectrogram, MFCC, pitch, spectral descriptors

Extract features from audio signals for use as input to machine learning or deep learning systems. Use individual functions, such as melSpectrogram, mfcc, pitch, and spectralCentroid, or use the audioFeatureExtractor object to create a feature extraction pipeline that minimizes redundant calculations. Use blocks such as Mel Spectrogram and MFCC to extract features from audio signals in Simulink^®. In live scripts, use Extract Audio Features to graphically select the features to extract.

Objects

`audioFeatureExtractor`	Streamline audio feature extraction
`ivectorSystem`	Create i-vector system (Since R2021a)

Live Editor Tasks

Extract Audio Features

Streamline audio feature extraction in the Live Editor

Functions

expand all

Auditory Spectrograms

`audioDelta`	Compute delta features
`designAuditoryFilterBank`	Design auditory filter bank
`melSpectrogram`	Mel spectrogram

Auditory Cepstral Coefficients

`audioDelta`	Compute delta features
`cepstralCoefficients`	Extract cepstral coefficients
`gtcc`	Extract gammatone cepstral coefficients, log-energy, delta, and delta-delta
`mfcc`	Extract MFCC, log energy, delta, and delta-delta of audio signal

Feature Embeddings

`openl3Embeddings`	Extract OpenL3 feature embeddings (Since R2022a)
`vggishEmbeddings`	Extract VGGish feature embeddings (Since R2022a)
`speakerEmbeddings`	Extract speaker embeddings from speech (Since R2024b)

Periodicity and Harmonicity

`audioDelta`	Compute delta features
`harmonicRatio`	Harmonic ratio
`pitch`	Estimate fundamental frequency of audio signal
`pitchnn`	Estimate pitch with deep learning neural network (Since R2021a)

Spectral Descriptors

`audioDelta`	Compute delta features
`spectralCentroid`	Spectral centroid for audio signals and auditory spectrograms
`spectralCrest`	Spectral crest for signals and spectrograms
`spectralDecrease`	Spectral decrease for audio signals and auditory spectrograms
`spectralEntropy`	Spectral entropy for signals and spectrograms
`spectralFlatness`	Spectral flatness for signals and spectrograms
`spectralFlux`	Spectral flux for audio signals and auditory spectrograms
`spectralKurtosis`	Spectral kurtosis for signals and spectrograms
`spectralRolloffPoint`	Spectral rolloff point for audio signals and auditory spectrograms
`spectralSkewness`	Spectral skewness for signals and spectrograms
`spectralSlope`	Spectral slope for audio signals and auditory spectrograms
`spectralSpread`	Spectral spread for audio signals and auditory spectrograms

Domain Conversion

`erb2hz`	Convert from equivalent rectangular bandwidth (ERB) scale to hertz
`bark2hz`	Convert from Bark scale to hertz
`mel2hz`	Convert from mel scale to hertz
`hz2erb`	Convert from hertz to equivalent rectangular bandwidth (ERB) scale
`hz2bark`	Convert from hertz to Bark scale
`hz2mel`	Convert from hertz to mel scale
`phon2sone`	Convert from phon to sone
`sone2phon`	Convert from sone to phon

Blocks

Audio Delta	Compute delta features (Since R2022b)
Auditory Spectrogram	Extract mel, Bark, or ERB spectrogram from audio (Since R2022a)
Cepstral Coefficients	Extract cepstral coefficients from spectrogram (Since R2022b)
Design Auditory Filter Bank	Design frequency-domain auditory filter bank (Since R2022a)
Design Mel Filter Bank	Design frequency-domain mel filter bank (Since R2022a)
Mel Spectrogram	Extract mel spectrogram from audio (Since R2022a)
MFCC	Extract mel-frequency cepstral coefficients from audio (Since R2022b)

Topics

Feature Selection for Audio Classification
Perform audio feature selection to select a feature set for either speaker recognition or word recognition tasks.
Extract Features from Audio Data Sets
Use different methods of extracting features from an audio data set.
Spectral Descriptors
Overview and applications of spectral descriptors.
Learn Pre-Emphasis Filter Using Deep Learning
Use a convolutional deep network to learn a pre-emphasis filter for speech recognition. (Since R2021b)

Featured Examples

Train Spoken Digit Recognition Network Using Out-of-Memory Features

Trains a spoken digit recognition network on out-of-memory auditory spectrograms using a transformed datastore. In this example, you extract auditory spectrograms from audio using audioDatastore and audioFeatureExtractor, and you write them to disk. You then use a signalDatastore to access the features during training. The workflow is useful when the training features do not fit in memory. In this workflow, you only extract features once, which speeds up your workflow if you are iterating on the deep learning model design.

Open Live Script

Sequential Feature Selection for Audio Features

A typical workflow for feature selection applied to the task of spoken digit recognition.

Open Live Script

Pitch Tracking Using Multiple Pitch Estimations and HMM

Perform pitch tracking using multiple pitch estimations, octave and median smoothing, and a hidden Markov model (HMM).

Open Live Script