stretchAudio

Time-stretch audio

Since R2019b

Syntax

audioOut = stretchAudio(audioIn,alpha)

audioOut = stretchAudio(audioIn,alpha,Name,Value)

Description

audioOut = stretchAudio(audioIn,alpha) applies time scale modification (TSM) on the input audio by the TSM factor alpha.

example

audioOut = stretchAudio(audioIn,alpha,Name,Value) specifies options using one or more Name,Value pair arguments.

Examples

collapse all

Apply TSM

Open Live Script

Read in an audio signal. Listen to the audio signal and plot it over time.

[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");

t = (0:size(audioIn,1)-1)/fs;
plot(t,audioIn)
xlabel('Time (s)')
ylabel('Amplitude')
title('Original Signal')
axis tight
grid on

sound(audioIn,fs)

Use stretchAudio to apply a 1.5 speedup factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same, but the duration of the signal has decreased.

audioOut = stretchAudio(audioIn,1.5);

t = (0:size(audioOut,1)-1)/fs;
plot(t,audioOut)
xlabel('Time (s)')
ylabel('Amplitude')
title('Modified Signal, Speedup Factor = 1.5')
axis tight
grid on

sound(audioOut,fs)

Slow down the original audio signal by a 0.75 factor. Listen to the modified audio signal and plot it over time. The sample rate remains the same as the original audio, but the duration of the signal has increased.

audioOut = stretchAudio(audioIn,0.75);

t = (0:size(audioOut,1)-1)/fs;
plot(t,audioOut)
xlabel('Time (s)')
ylabel('Amplitude')
title('Modified Signal, Speedup Factor = 0.75')
axis tight
grid on

sound(audioOut,fs)

Apply TSM to Frequency-Domain Audio

Open Live Script

stretchAudio supports TSM on frequency-domain audio when using the default vocoder method. Applying TSM to frequency-domain audio enables you to reuse your STFT computation for multiple TSM factors.

Read in an audio signal. Listen to the audio signal and plot it over time.

[audioIn,fs] = audioread('FemaleSpeech-16-8-mono-3secs.wav');

sound(audioIn,fs)

t = (0:size(audioIn,1)-1)/fs;
plot(t,audioIn)
xlabel('Time (s)')
ylabel('Amplitude')
title('Original Signal')
axis tight
grid on

Convert the audio signal to the frequency domain.

win = sqrt(hann(256,'periodic'));
ovrlp = 192;
S = stft(audioIn,'Window',win,'OverlapLength',ovrlp,'Centered',false);

Speed up the audio signal by a factor of 1.4. Specify the window and overlap length used to create the frequency-domain representation.

alpha = 1.4;
audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp);

sound(audioOut,fs)

t = (0:size(audioOut,1)-1)/fs;
plot(t,audioOut)
xlabel('Time (s)')
ylabel('Amplitude')
title('Modified Signal, TSM Factor = 1.4')
axis tight
grid on

Slow down the audio signal by a factor of 0.8. Specify the window and overlap length used to create the frequency-domain representation.

alpha = 0.8;
audioOut = stretchAudio(S,alpha,'Window',win,'OverlapLength',ovrlp);

sound(audioOut,fs)

t = (0:size(audioOut,1)-1)/fs;
plot(t,audioOut)
xlabel('Time (s)')
ylabel('Amplitude')
title('Modified Signal, TSM Factor = 0.8')
axis tight
grid on

Increase Fidelity Using Phase-Locking

Open Live Script

The default TSM method (vocoder) enables you to additionally apply phase-locking to increase the fidelity to the original audio.

Read in an audio signal. Listen to the audio signal and plot it over time.

[audioIn,fs] = audioread("SpeechDFT-16-8-mono-5secs.wav");

sound(audioIn,fs)

t = (0:size(audioIn,1)-1)/fs;
plot(t,audioIn)
xlabel('Time (s)')
ylabel('Amplitude')
title('Original Signal')
axis tight
grid on

Phase-locking adds a nontrivial computational load to TSM and is not always required. By default, phase-locking is disabled. Apply a speedup factor of 1.8 to the input audio signal. Listen to the audio signal and plot it over time.

alpha = 1.8;

tic
audioOut = stretchAudio(audioIn,alpha);
processingTimeWithoutPhaseLocking = toc

processingTimeWithoutPhaseLocking = 0.0798

sound(audioOut,fs)

t = (0:size(audioOut,1)-1)/fs;
plot(t,audioOut)
xlabel('Time (s)')
ylabel('Amplitude')
title('Modified Signal, alpha = 1.8, LockPhase = false')
axis tight
grid on

Apply the same 1.8 speedup factor to the input audio signal, this time enabling phase-locking. Listen to the audio signal and plot it over time.

tic
audioOut = stretchAudio(audioIn,alpha,"LockPhase",true);
processingTimeWithPhaseLocking = toc

processingTimeWithPhaseLocking = 0.1154

sound(audioOut,fs)

t = (0:size(audioOut,1)-1)/fs;
plot(t,audioOut)
xlabel('Time (s)')
ylabel('Amplitude')
title('Modified Signal, alpha = 1.8, LockPhase = true')
axis tight
grid on

Increase Fidelity Using WSOLA Delta

Open Live Script

The waveform similarity overlap-add (WSOLA) TSM method enables you to specify the maximum number of samples to search for the best signal alignment. By default, WSOLA delta is the number of samples in the analysis window minus the number of samples overlapped between adjacent analysis windows. Increasing the WSOLA delta increases the computational load but might also increase fidelity.

Read in an audio signal. Listen to the first 10 seconds of the audio signal.

[audioIn,fs] = audioread('RockGuitar-16-96-stereo-72secs.flac');

sound(audioIn(1:10*fs,:),fs)

Apply a TSM factor of 0.75 to the input audio signal using the WSOLA method. Listen to the first 10 seconds of the resulting audio signal.

alpha = 0.75;
tic
audioOut = stretchAudio(audioIn,alpha,"Method","wsola");
processingTimeWithDefaultWSOLADelta = toc

processingTimeWithDefaultWSOLADelta = 19.4403

sound(audioOut(1:10*fs,:),fs)

Apply a TSM factor of 0.75 to the input audio signal, this time increasing the WSOLA delta to 1024. Listen to the first 10 seconds of the resulting audio signal.

tic
audioOut = stretchAudio(audioIn,alpha,"Method","wsola","WSOLADelta",1024);
processingTimeWithIncreasedWSOLADelta = toc

processingTimeWithIncreasedWSOLADelta = 25.5306

sound(audioOut(1:10*fs,:),fs)

Input Arguments

collapse all

`audioIn` — Input signal
column vector | matrix | 3-D array

Input signal, specified as a column vector, matrix, or 3-D array. How the function interprets audioIn depends on the complexity of audioIn and the value of Method:

If audioIn is real, audioIn is interpreted as a time-domain signal. In this case, audioIn must be a column vector or matrix. Columns are interpreted as individual channels.
This syntax applies when Method is set to 'vocoder' or 'wsola'.
If audioIn is complex, audioIn is interpreted as a frequency-domain signal. In this case, audioIn must be an L-by-M-by-N array, where L is the FFT length, M is the number of individual spectra, and N is the number of channels.
This syntax only applies when Method is set to 'vocoder'.

Data Types: single | double
Complex Number Support: Yes

`alpha` — TSM factor
positive scalar

TSM factor, specified as a positive scalar.

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: 'Window',kbdwin(512)

`Method` — Method used to time-scale audio
`'vocoder'` (default) | `'wsola'`

Method used to time-scale audio, specified as the comma-separated pair consisting of 'Method' and 'vocoder' or 'wsola'. Set 'Method' to 'vocoder' to use the phase vocoder method. Set 'Method' to 'wsola' to use the WSOLA method.

If 'Method' is set to 'vocoder', audioIn can be real or complex. If 'Method' is set to 'wsola', audioIn must be real.

Data Types: single | double

`Window` — Window applied in time domain
`sqrt(hann(1024,'periodic'))` (default) | real vector

Window applied in the time domain, specified as the comma-separated pair consisting of 'Window' and a real vector. The number of elements in the vector must be in the range [1, size(audioIn,1)]. The number of elements in the vector must also be greater than OverlapLength.

Note

If using stretchAudio with frequency-domain input, you must specify Window as the same window used to transform audioIn to the frequency domain.

Data Types: single | double

`OverlapLength` — Number of samples overlapped between adjacent windows
`round(0.75*numel(Window))` (default) | scalar in the range [0 `numel(Window)`)

Number of samples overlapped between adjacent windows, specified as the comma-separated pair consisting of 'OverlapLength' and an integer in the range [0, numel(Window)).

Note

If using stretchAudio with frequency-domain input, you must specify OverlapLength as the same overlap length used to transform audioIn to a time-frequency representation.

Data Types: single | double

`LockPhase` — Apply identity phase-locking
`false` (default) | `true`

Apply identity phase-locking, specified as the comma-separated pair consisting of 'LockPhase' and false or true.

Dependencies

To enable this name-value pair argument, set Method to 'vocoder'.

Data Types: logical

`WSOLADelta` — Maximum samples used to search for best signal alignment
`numel(Window)-OverlapLength` (default) | nonnegative scalar

Maximum number of samples used to search for the best signal alignment, specified as the comma-separated pair consisting of 'WSOLADelta' and a nonnegative scalar.

Dependencies

To enable this name-value pair argument, set Method to 'wsola'.

Data Types: single | double

Output Arguments

collapse all

`audioOut` — Time-scale modified audio
column vector | matrix

Time-scale modified audio, returned as a column vector or matrix of independent channels.

Algorithms

collapse all

Phase Vocoder

The phase vocoder algorithm is a frequency-domain approach to TSM [1][2]. The basic steps of the phase vocoder algorithm are:

The algorithm windows a time-domain signal at interval η, where η = numel(Window) - OverlapLength. The windows are then converted to the frequency domain.
To preserve horizontal (across time) phase coherence, the algorithm treats each bin as an independent sinusoid whose phase is computed by accumulating the estimates of its instantaneous frequency.
To preserve vertical (across an individual spectrum) phase coherence, the algorithm locks the phase advance of groups of bins to the phase advance of local peaks. This step only applies if LockPhase is set to true.
The algorithm returns the modified spectrogram to the time domain, with windows spaced at intervals of δ, where δ ≈ η/α. α is the speedup factor specified by the alpha input argument.

WSOLA

The WSOLA algorithm is a time-domain approach to TSM [1][2]. WSOLA is an extension of the overlap and add (OLA) algorithm. In the OLA algorithm, a time-domain signal is windowed at interval η, where η = numel(Window) - OverlapLength. To construct the time-scale modified output audio, the windows are spaced at interval δ, where δ ≈ η/α. α is the TSM factor specified by the alpha input argument.

The OLA algorithm does a good job of recreating the magnitude spectra but can introduce phase jumps between windows. The WSOLA algorithm attempts to smooth the phase jumps by searching WSOLADelta samples around the η interval for a window that minimizes phase jumps. The algorithm searches for the best window iteratively, so that each successive window is chosen relative to the previously selected window.

If WSOLADelta is set to 0, then the algorithm reduces to OLA.

References

[1] Driedger, Johnathan, and Meinard Müller. "A Review of Time-Scale Modification of Music Signals." Applied Sciences. Vol. 6, Issue 2, 2016.

[2] Driedger, Johnathan. "Time-Scale Modification Algorithms for Music Audio Signals", Master's thesis, Saarland University, Saarbrücken, Germany, 2011.

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Usage notes and limitations:

Method must be set to 'vocoder'.
LockPhase must be set to false.
Using gpuArray (Parallel Computing Toolbox) input with stretchAudio is only recommended for a GPU with compute capability 7.0 ("Volta") or above. Other hardware might not offer any performance advantage. To check your GPU compute capability, see ComputeCompability in the output from the gpuDevice (Parallel Computing Toolbox) function. For more information, see GPU Computing Requirements (Parallel Computing Toolbox).

For an overview of GPU usage in MATLAB^®, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2019b

stretchAudio

Syntax

Description

Examples

Apply TSM

Apply TSM to Frequency-Domain Audio

Increase Fidelity Using Phase-Locking

Increase Fidelity Using WSOLA Delta

Input Arguments

`audioIn` — Input signal
column vector | matrix | 3-D array

`alpha` — TSM factor
positive scalar

Name-Value Arguments

`Method` — Method used to time-scale audio
`'vocoder'` (default) | `'wsola'`

`Window` — Window applied in time domain
`sqrt(hann(1024,'periodic'))` (default) | real vector

`OverlapLength` — Number of samples overlapped between adjacent windows
`round(0.75*numel(Window))` (default) | scalar in the range [0 `numel(Window)`)

`LockPhase` — Apply identity phase-locking
`false` (default) | `true`

Dependencies

`WSOLADelta` — Maximum samples used to search for best signal alignment
`numel(Window)-OverlapLength` (default) | nonnegative scalar

Dependencies

Output Arguments

`audioOut` — Time-scale modified audio
column vector | matrix

Algorithms

Phase Vocoder

WSOLA

References

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

stretchAudio

Syntax

Description

Examples

Apply TSM

Apply TSM to Frequency-Domain Audio

Increase Fidelity Using Phase-Locking

Increase Fidelity Using WSOLA Delta

Input Arguments

audioIn — Input signal column vector | matrix | 3-D array

alpha — TSM factor positive scalar

Name-Value Arguments

Method — Method used to time-scale audio 'vocoder' (default) | 'wsola'

Window — Window applied in time domain sqrt(hann(1024,'periodic')) (default) | real vector

OverlapLength — Number of samples overlapped between adjacent windows round(0.75*numel(Window)) (default) | scalar in the range [0 numel(Window))

LockPhase — Apply identity phase-locking false (default) | true

Dependencies

WSOLADelta — Maximum samples used to search for best signal alignment numel(Window)-OverlapLength (default) | nonnegative scalar

Dependencies

Output Arguments

audioOut — Time-scale modified audio column vector | matrix

Algorithms

Phase Vocoder

WSOLA

References

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using MATLAB® Coder™.

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

See Also

Topics

`audioIn` — Input signal
column vector | matrix | 3-D array

`alpha` — TSM factor
positive scalar

`Method` — Method used to time-scale audio
`'vocoder'` (default) | `'wsola'`

`Window` — Window applied in time domain
`sqrt(hann(1024,'periodic'))` (default) | real vector

`OverlapLength` — Number of samples overlapped between adjacent windows
`round(0.75*numel(Window))` (default) | scalar in the range [0 `numel(Window)`)

`LockPhase` — Apply identity phase-locking
`false` (default) | `true`

`WSOLADelta` — Maximum samples used to search for best signal alignment
`numel(Window)-OverlapLength` (default) | nonnegative scalar

`audioOut` — Time-scale modified audio
column vector | matrix

C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.