Main Content

Voice Activity Detector

Detect presence of speech in audio signal

  • Voice Activity Detector block

Libraries:
Audio Toolbox / Measurements

Description

The Voice Activity Detector block detects the presence of speech in an audio signal. You can also use the Voice Activity Detector block to output an estimate of the noise variance per frequency bin.

Ports

Input

expand all

  • Matrix input –– Each column of the input is treated as an independent channel.

  • 1-D vector input –– The input is treated as a single channel.

This port is unnamed unless you specify additional input ports.

Data Types: single | double

Dependencies

To enable this port, select Specify silence-to-speech probability from input port for the Probability of transition from a silence frame to a speech frame parameter.

Data Types: single | double

Dependencies

To enable this port, select Specify speech-to-silence probability from input port for the Probability of transition from a speech frame to a silence frame parameter.

Data Types: single | double

Output

expand all

The block outputs a scalar or row vector with the same number of columns as the input signal.

This port is unnamed until you select the Output noise variance parameter.

Data Types: single | double

The block outputs a column vector or a matrix with the same number of columns as the input signal.

Dependencies

To enable this port, select the Output noise variance parameter.

Data Types: single | double

Parameters

expand all

If a parameter is listed as tunable, then you can change its value during simulation.

The window function is designed using the algorithms of the following functions:

Tunable: No

Dependencies

To enable this parameter, set Domain of the input to Time.

Dependencies

To enable this parameter, set Domain of the input to Time and Window to Chebyshev or Kaiser.

Data Types: single | double

Tunable: No

Dependencies

To enable this parameter, set Domain of the input to Time.

Tunable: No

Dependencies

To enable this parameter, set Domain of the input to Time and clear the Inherit FFT length from input dimensions parameter.

Data Types: single | double

To specify Probability of transition from a silence frame to a speech frame from an input port, select Specify silence-to-speech probability from input port.

Tunable: Yes

Data Types: single | double

To specify Probability of transition from a speech frame to a silence frame from an input port, select Specify speech-to-silence probability from input port.

Tunable: Yes

Data Types: single | double

When you select this parameter, an additional output port, N, is added to the block.

  • Code generation – Simulate the model using generated C code. The first time you run a simulation, Simulink® generates C code for the block. The C code is reused for subsequent simulations, as long as the model does not change. This option requires additional startup time, but the speed of the subsequent simulations is comparable to Interpreted execution.

  • Interpreted execution – Simulate the model using the MATLAB® interpreter. This option reduces startup time, but has a slower simulation speed than Code generation. In this mode, you can debug the source code of the block.

Tunable: No

Block Characteristics

Data Types

double | single

Direct Feedthrough

no

Multidimensional Signals

no

Variable-Size Signals

no

Zero-Crossing Detection

no

Algorithms

The Voice Activity Detector implements the algorithm described in [1].

If Domain of the input is specified as Time, the input signal is windowed and then converted to the frequency domain according to the Window, Sidelobe attenuation of the window (dB), and FFT length parameters. If Domain of the input is specified as Frequency, the input is assumed to be a windowed discrete time Fourier transform (DTFT) of an audio signal. The signal is then converted to the power domain. Noise variance is estimated according to [2]. The posterior and prior SNR are estimated according to the Minimum Mean-Square Error (MMSE) formula described in [3]. A log likelihood ratio test with a Hidden Markov Model (HMM)-based hang-over scheme is used, according to [1].

References

[1] Sohn, Jongseo., Nam Soo Kim, and Wonyong Sung. "A Statistical Model-Based Voice Activity Detection." Signal Processing Letters IEEE. Vol. 6, No. 1, 1999.

[2] Martin, R. "Noise Power Spectral Density Estimation Based on Optimal Smoothing and Minimum Statistics." IEEE Transactions on Speech and Audio Processing. Vol. 9, No. 5, 2001, pp. 504–512.

[3] Ephraim, Y., and D. Malah. "Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator." IEEE Transactions on Acoustics, Speech, and Signal Processing. Vol. 32, No. 6, 1984, pp. 1109–1121.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Version History

Introduced in R2018a