Extract mfcc, log energy, delta, and delta-delta of audio signal

sets each property `coeffs`

= mfcc(___,`Name,Value`

)`Name`

to the specified
`Value`

. Unspecified properties have default values..

`[coeffs] = mfcc(audioIn,fs,'LogEnergy','Replace')`

returns mel frequency cepstral coefficients for the audio input signal sampled at
`fs`

Hz. The first coefficient in the `coeffs`

vector is replaced with the log energy value.`[`

returns the delta, delta-delta, and location of samples corresponding to each window
of data.`coeffs`

,`delta`

,`deltaDelta`

,`loc`

] = mfcc(___)

Compute the mel frequency cepstral coefficients of a speech signal using the `mfcc`

function. The function returns `delta`

, the change in coefficients, and `deltaDelta`

, the change in delta values. The log energy value that the function computes can prepend the coefficients vector or replace the first element of the coefficients vector. This is done based on whether you set the `'LogEnergy'`

argument to `'Append'`

or `'Replace'`

.

Read an audio signal from the `'Counting-16-44p1-mono-15secs.wav'`

file using the `audioread`

function. The `mfcc`

function processes the entire speech data in a batch. The default `DeltaWindowLength`

is 2. Therefore, `delta`

is computed as the difference between the current coefficients and the previous coefficients. `deltaDelta`

is computed as the difference between the current and the previous delta values. Based on the number of input rows, the window length, and the hop length, `mfcc`

partitions the speech into 1551 frames and computes the cepstral features for each frame. Each row in the `coeffs`

matrix corresponds to the log-energy value followed by the 13 mel-frequency cepstral coefficients for the corresponding frame of the speech file. The function also computes `loc`

, the location of the last sample in each input frame.

```
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
```

Read in an audio file and convert it to a frequency representation.

[audioIn,fs] = audioread("Rainbow-16-8-mono-114secs.wav"); win = hann(1024,"periodic"); S = stft(audioIn,"Window",win,"OverlapLength",512,"Centered",false);

To extract the mel-frequency cepstral coefficients, call `mfcc`

with the frequency-domain audio. Ignore the log-energy.

coeffs = mfcc(S,fs,"LogEnergy","Ignore");

In many applications, MFCC observations are converted to summary statistics for use in classification tasks. Plot probability density functions of each of the mel-frequency cepstral coefficients to observe their distributions.

nbins = 60; for i = 1:size(coeffs,2) figure histogram(coeffs(:,i),nbins,"Normalization","pdf") title(sprintf("Coefficient %d",i-1)) end

`audioIn`

— Input signalvector | matrix | 3-D array

Input signal, specified as a vector, matrix, or 3-D array.

If

`audioIn`

is real, it is interpreted as a time-domain signal and must be a column vector or a matrix. Columns of the matrix are treated as independent audio channels.If

`audioIn`

is complex, it is interpreted as a frequency-domain signal. In this case,`audioIn`

must be an*L*-by-*M*-by-*N*array, where*L*is the number of DFT points,*M*is the number of individual spectrums, and*N*is the number of individual channels.

**Data Types: **`single`

| `double`

**Complex Number Support: **Yes

`fs`

— Sample rate in Hzpositive scalar

Sample rate of the input signal in Hz, specified as a positive scalar.

**Data Types: **`single`

| `double`

Specify optional
comma-separated pairs of `Name,Value`

arguments. `Name`

is
the argument name and `Value`

is the corresponding value.
`Name`

must appear inside quotes. You can specify several name and value
pair arguments in any order as
`Name1,Value1,...,NameN,ValueN`

.

```
[coeffs,delta,deltaDelta,loc] =
mfcc(audioIn,fs,'LogEnergy','Replace','DeltaWindowLength',5)
```

returns
mel frequency cepstral coefficients for the audio input signal sampled at
`fs`

Hz. The first coefficient in the `coeffs`

vector is replaced with the log energy value. A set of 5 cepstral coefficients is
used to compute the delta and the delta-delta values.`'WindowLength'`

— Number of samples in analysis window`round(``fs`

*0.03)

(default) | positive scalar integer`'OverlapLength'`

— Number of overlapping samples between adjacent windows`round(``fs`

*0.02)

(default) | integerNumber of samples which overlap or underlap between the adjacent
windows. An `'OverlapLength'`

value that is:

Positive indicates an overlap between adjacent windows.

Negative indicates an underlap between adjacent windows.

Zero indicates no overlap between adjacent windows.

The `'OverlapLength'`

value must be set to less
than the `'WindowLength'`

.

Here is how the overlapping frames look:

Here is how the underlapping frames look:

**Data Types: **`single`

| `double`

`'NumCoeffs'`

— Number of coefficients returned`13`

(default) | positive scalar integerNumber of coefficients returned for each window of data, specified as
an integer in the range [2 *v*], where
*v* is the number of valid passbands.

The number of valid passbands is defined as ```
sum(BandEdges
<= floor(fs/2))-2
```

. A passband is valid if its edges
fall below `fs/2`

, where *fs* is the
sample rate of the input audio signal, specified as the second argument,
`fs`

.

**Data Types: **`single`

| `double`

`'BandEdges'`

— Band edges of filter bank (Hz)row vector

Band edges of the filter bank in Hz, specified as a nonnegative
monotonically increasing row vector in the range [0,
`fs`

/2]. The number of band edges must be in the
range [4, 160]. The `mfcc`

function designs
half-overlapped triangular filters based on
`BandEdges`

. This means that all band edges,
except for the first and last, are also center frequencies of the
designed bandpass filters.

By default, `BandEdges`

is a 42-element vector,
which results in a 40-band filter bank that spans approximately 133 Hz
to 6864 Hz:

Filters | Passband Edges (Hz) |
---|---|

Filter 1 | [133 267] |

Filter 2 | [200 333] |

Filter 3 | [267 400] |

Filter 4 | [333 467] |

Filter 5 | [400 533] |

Filter 6 | [467 600] |

Filter 7 | [533 667] |

Filter 8 | [600 733] |

Filter 9 | [667 800] |

Filter 10 | [733 867] |

Filter 11 | [800 933] |

Filter 12 | [867 999] |

Filter 13 | [933 1071] |

Filter 14 | [999 1147] |

Filter 15 | [1071 1229] |

Filter 16 | [1147 1316] |

Filter 17 | [1229 1410] |

Filter 18 | [1316 1510] |

Filter 19 | [1410 1618] |

Filter 20 | [1510 1733] |

Filter 21 | [1618 1856] |

Filter 22 | [1733 1988] |

Filter 23 | [1856 2130] |

Filter 24 | [1988 2281] |

Filter 25 | [2130 2444] |

Filter 26 | [2281 2618] |

Filter 27 | [2444 2804] |

Filter 28 | [2618 3004] |

Filter 29 | [2804 3217] |

Filter 30 | [3004 3446] |

Filter 31 | [3217 3692] |

Filter 32 | [3446 3954] |

Filter 33 | [3692 4236] |

Filter 34 | [3954 4537] |

Filter 35 | [4236 4860] |

Filter 36 | [4537 5206] |

Filter 37 | [4860 5577 |

Filter 38 | [5206 5973] |

Filter 39 | [5577 6399] |

Filter 40 | [5973 6854] |

The passband edges in the table are rounded for readability. For exact
edges, see the default settings of the `cepstralFeatureExtractor`

.

**Data Types: **`single`

| `double`

`'FFTLength'`

— Number of bins for calculating DFT`WindowLength`

(default) | positive scalar integerNumber of bins used to calculate the DFT of windowed input samples.
The FFT length value must be greater than or equal to the
`'WindowLength'`

value. The
`'WindowLength'`

argument specifies the number of
rows in the windowed input. By default, the FFT length value is set to
the `'WindowLength'`

.

**Data Types: **`single`

| `double`

`'DeltaWindowLength'`

— Number of coefficients for calculating delta and delta-delta`2`

(default) | odd integer greater than 2Number of coefficients used to calculate the delta and the delta-delta values, specified as 2 or an odd integer greater than 2.

If `'DeltaWindowLength'`

is set to
`2`

, the `delta`

is given by the
difference between the current coefficients and the previous
coefficients,

If `'DeltaWindowLength'`

is set to an odd integer
greater than `2`

, the delta values are given by the
following equation:

The function uses a least-squares approximation of the local slope
over a region around the current time sample. The delta cepstral values
are computed by fitting the cepstral coefficients of neighboring frames
(*M* frames before the current frame and
*M* frames after the current frame) by a straight
line. For details, see [1].

**Data Types: **`single`

| `double`

`'LogEnergy'`

— Specify how the log energy is shown`'Append'`

(default) | `'Replace'`

| `'Ignore'`

Specify how the log energy is shown in the coefficients vector output, specified as:

`'Append'`

–– The function prepends the log energy to the coefficients vector. The length of the coefficients vector is 1 +`NumCoeffs`

.`'Replace'`

–– The function replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is`NumCoeffs`

.`'Ignore'`

–– The object does not calculate or return the log energy.

**Data Types: **`char`

| `string`

`coeffs`

— Mel frequency cepstral coefficients (MFCCs)matrix | array

Mel frequency cepstral coefficients, returned as an
*L*-by-*M* matrix or an
*L*-by-*M*-by-*N*
array, where,

*L*–– Number of frames the audio signal is partitioned into. The`'WindowLength'`

and`'OverlapLength'`

properties control this dimension.The number of audio frames,

*L*, is computed using the following equation:*nRows*–– Number of input rows.*winLen*–– Number of samples in the analysis window, specified by the`'WindowLength'`

argument. If not specified, the window length is`round(`

.`fs`

*0.03)*hopLen*–– Number of samples in the current frame before the start of the next frame. Hop length is given by .

*M*–– Number of coefficients returned per frame. This value is determined by the`NumCoeffs`

and`LogEnergy`

properties.When the

`LogEnergy`

property is set to:`'Append'`

–– The object prepends the log energy value to the coefficients vector. The length of the coefficients vector is 1 +`NumCoeffs`

.`'Replace'`

–– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is`NumCoeffs`

.`'Ignore'`

–– The object does not calculate or return the log energy.

*N*–– Number of input channels (columns).

**Data Types: **`single`

| `double`

`delta`

— Change in coefficientsmatrix | array

Change in coefficients from one frame of data to another, returned as an
*L*-by-*M* matrix or an
*L*-by-*M*-by-*N*
array. The `delta`

array is the same size and data type
as the `coeffs`

array.

If `'DeltaWindowLength'`

is set to
`2`

, the `delta`

is given by the
difference between the current coefficients and the previous coefficients,

Consider the example below which computes the mel frequency coefficients
for the entire speech file. The `'DeltaWindowLength'`

value is `2`

. The `mfcc`

function
partitions the speech into 1551 frames. Each row in the
`coeffs`

matrix corresponds to the log energy value
followed by the 13 mel frequency cepstral coefficients for the corresponding
segment of the speech file.

```
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
```

The first row of the delta matrix, `delta(1,:)`

is
zeros. The second row, `delta(2,:)`

equals the difference
in coefficients for the current frame, `coeffs(2,:)`

and
the previous frame, `coeffs(1,:)`

.

If `'DeltaWindowLength'`

is set to an odd integer
greater than `2`

, the delta values are given by the
following equation:

The function uses a least-squares approximation of the local slope over a region around the current time sample. For details, see [1].

**Data Types: **`single`

| `double`

`deltaDelta`

— Change in delta valuesmatrix | array

Change in `delta`

values from one frame of data to
another, returned as an *L*-by-*M* matrix
or an *L*-by-*M*-by-*N*
array. The `deltaDelta`

array is the same size and data
type as the `coeffs`

and `delta`

arrays.

If `'DeltaWindowLength'`

is set to
`2`

, the `deltaDelta`

is given by the
difference between the current delta values and the previous delta values,

Consider the example below which computes the mel frequency coefficients
for the entire speech file. The `'DeltaWindowLength'`

value is `2`

.

```
[audioIn,fs] = audioread('Counting-16-44p1-mono-15secs.wav');
[coeffs,delta,deltaDelta,loc] = mfcc(audioIn,fs);
```

The first row of the `deltaDelta`

matrix,
`deltaDelta(1,:)`

is zeros. The second row,
`deltaDelta(2,:)`

equals the difference in delta
values for the current frame, `delta(2,:)`

and the
previous frame, `delta(1,:)`

.

If `'DeltaWindowLength'`

is set to an odd integer
greater than `2`

, the `deltaDelta`

values are given by the following equation:

The function uses a least-squares approximation of the local slope over a region around the current time sample. For details, see [1].

**Data Types: **`single`

| `double`

`loc`

— Location of the last sample in each input framevector

Location of last sample in each input frame, returned as a vector. The
`loc`

vector is given by the
[*t*_{1},
*t*_{2},
*t*_{3},…,*t*_{n}]
elements in the following diagram, where *n* corresponds to
the number of frames the input is partitioned into, and
*t*_{n} is the last sample of the
last frame.

**Data Types: **`single`

| `double`

The `mfcc`

function splits the entire data into overlapping
segments. The length of each rolloff segment is determined by the
`'WindowLength'`

argument. The length of overlap between segments
is determined by the `'OverlapLength'`

argument.

The function computes the mel frequency cepstral coefficients, log energy values,
cepstral delta, and the cepstral delta-delta values for each segment as per the
algorithm described in `cepstralFeatureExtractor`

.

[1] Rabiner, Lawrence R., and
Ronald W. Schafer. *Theory and Applications of Digital Speech
Processing*. Upper Saddle River, NJ: Pearson, 2010.

[2] Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf

Generate C and C++ code using MATLAB® Coder™.

Cepstral Feature
Extractor | Voice Activity
Detector | `audioFeatureExtractor`

| `cepstralFeatureExtractor`

| `pitch`

| `voiceActivityDetector`

A modified version of this example exists on your system. Do you want to open this version instead? (ja_JP)

MATLAB のコマンドを実行するリンクがクリックされました。

このリンクは、Web ブラウザーでは動作しません。MATLAB コマンド ウィンドウに以下を入力すると、このコマンドを実行できます。

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

Select web siteYou can also select a web site from the following list:

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

- América Latina (Español)
- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)