augment
Augment audio data
Description
Examples
Read in an audio signal and listen to it.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
sound(audioIn,fs)Create an audioDataAugmenter object that applies time stretching, volume control, and time shifting in cascade. Apply each of the augmentations with 80% probability. Set NumAugmentations to 5 to output five independently augmented signals. To skip pitch shifting and noise addition for each augmentation, set the respective probabilities to 0. Define parameter ranges for each relevant augmentation algorithm.
augmenter = audioDataAugmenter( ... "AugmentationMode","sequential", ... "NumAugmentations",5, ... ... "TimeStretchProbability",0.8, ... "SpeedupFactorRange", [1.3,1.4], ... ... "PitchShiftProbability",0, ... ... "VolumeControlProbability",0.8, ... "VolumeGainRange",[-5,5], ... ... "AddNoiseProbability",0, ... ... "TimeShiftProbability",0.8, ... "TimeShiftRange", [-500e-3,500e-3])
augmenter =
audioDataAugmenter with properties:
AugmentationMode: "sequential"
AugmentationParameterSource: 'random'
NumAugmentations: 5
TimeStretchProbability: 0.8000
SpeedupFactorRange: [1.3000 1.4000]
PitchShiftProbability: 0
VolumeControlProbability: 0.8000
VolumeGainRange: [-5 5]
AddNoiseProbability: 0
TimeShiftProbability: 0.8000
TimeShiftRange: [-0.5000 0.5000]
Call augment on the audio to create 5 augmentations. The augmented audio is returned in a table with variables Audio and AugmentationInfo. The number of rows in the table is defined by NumAugmentations.
data = augment(augmenter,audioIn,fs)
data=5×2 table
Audio AugmentationInfo
_________________ ________________
{685056×1 double} 1×1 struct
{685056×1 double} 1×1 struct
{505183×1 double} 1×1 struct
{685056×1 double} 1×1 struct
{490728×1 double} 1×1 struct
In the current augmentation pipeline, augmentation parameters are assigned randomly from within the specified ranges. To determine the exact parameters used for an augmentation, inspect AugmentationInfo.
augmentationToInspect =
4;
data.AugmentationInfo(augmentationToInspect)ans = struct with fields:
SpeedupFactor: 1
VolumeGain: 4.3399
TimeShift: 0.4502
Listen to the augmentation you are inspecting. Plot time representation of the original and augmented signals.
augmentation = data.Audio{augmentationToInspect};
sound(augmentation,fs)
t = (0:(numel(audioIn)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioIn,taug,augmentation)
legend("Original Audio","Augmented Audio")
ylabel("Amplitude")
xlabel("Time (s)")
Read in an audio signal and listen to it.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");
sound(audioIn,fs)Create an audioDataAugmenter object that applies time stretching, pitch shifting, and noise corruption in cascade. Specify the time stretch speedup factors as 0.9, 1.1, and 1.2. Specify the pitch shifting in semitones as -2, -1, 1, and 2. Specify the noise corruption SNR as 10 dB and 15 dB.
augmenter = audioDataAugmenter( ... "AugmentationMode","sequential", ... "AugmentationParameterSource","specify", ... "SpeedupFactor",[0.9,1.1,1.2], ... "ApplyTimeStretch",true, ... "ApplyPitchShift",true, ... "SemitoneShift",[-2,-1,1,2], ... "SNR",[10,15], ... "ApplyVolumeControl",false, ... "ApplyTimeShift",false)
augmenter =
audioDataAugmenter with properties:
AugmentationMode: "sequential"
AugmentationParameterSource: "specify"
ApplyTimeStretch: 1
SpeedupFactor: [0.9000 1.1000 1.2000]
ApplyPitchShift: 1
SemitoneShift: [-2 -1 1 2]
ApplyVolumeControl: 0
ApplyAddNoise: 1
SNR: [10 15]
ApplyTimeShift: 0
Call augment on the audio to create 24 augmentations. The augmentations represent every combination of the specified augmentation parameters ().
data = augment(augmenter,audioIn,fs)
data=24×2 table
Audio AugmentationInfo
_________________ ________________
{761243×1 double} 1×1 struct
{622888×1 double} 1×1 struct
{571263×1 double} 1×1 struct
{761243×1 double} 1×1 struct
{622888×1 double} 1×1 struct
{571263×1 double} 1×1 struct
{761243×1 double} 1×1 struct
{622888×1 double} 1×1 struct
{571263×1 double} 1×1 struct
{761243×1 double} 1×1 struct
{622888×1 double} 1×1 struct
{571263×1 double} 1×1 struct
{761243×1 double} 1×1 struct
{622888×1 double} 1×1 struct
{571263×1 double} 1×1 struct
{761243×1 double} 1×1 struct
⋮
You can check the parameter configuration of each augmentation using the AugmentationInfo table variable.
augmentationToInspect =
1;
data.AugmentationInfo(augmentationToInspect)ans = struct with fields:
SpeedupFactor: 0.9000
SemitoneShift: -2
SNR: 10
Listen to the augmentation you are inspecting. Plot the time-domain representation of the original and augmented signals.
augmentation = data.Audio{augmentationToInspect};
sound(augmentation,fs)
t = (0:(numel(audioIn)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioIn,taug,augmentation)
legend("Original Audio","Augmented Audio")
ylabel("Amplitude")
xlabel("Time (s)")
Read in an audio signal and listen to it.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");Create an audioDataAugmenter object that applies noise corruption, and time shifting in parallel branches. For the noise corruption branch, randomly apply noise with an SNR in the range 0 dB to 20 dB. For the time shifting branch, randomly apply time shifting in the range -300 ms to 300 ms. Apply augmentation 2 times for each branch, for 4 total augmentations.
augmenter = audioDataAugmenter( ... "AugmentationMode","independent", ... "AugmentationParameterSource","random", ... "NumAugmentations",2, ... "ApplyTimeStretch",false, ... "ApplyPitchShift",false, ... "ApplyVolumeControl",false, ... "SNRRange",[0,20], ... "TimeShiftRange",[-300e-3,300e-3])
augmenter =
audioDataAugmenter with properties:
AugmentationMode: "independent"
AugmentationParameterSource: "random"
NumAugmentations: 2
ApplyTimeStretch: 0
ApplyPitchShift: 0
ApplyVolumeControl: 0
ApplyAddNoise: 1
SNRRange: [0 20]
ApplyTimeShift: 1
TimeShiftRange: [-0.3000 0.3000]
Call augment on the audio to create 3 augmentations.
data = augment(augmenter,audioIn,fs);
You can check the parameter configuration of each augmentation using the AugmentatioInfo table variable.
augmentationToInspect =
4;
data.AugmentationInfo{augmentationToInspect}ans = struct with fields:
TimeShift: 0.0016
Listen to the audio you are inspecting. Plot the time-domain representation of the original and augmented signals.
augmentation = data.Audio{augmentationToInspect};
sound(augmentation,fs)
t = (0:(numel(audioIn)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioIn,taug,augmentation)
legend("Original Audio","Augmented Audio")
ylabel("Amplitude")
xlabel("Time (s)")
Read in an audio signal and listen to it.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");Create an audioDataAugmenter object that applies volume control, noise corruption, and time shifting in parallel branches.
augmenter = audioDataAugmenter( ... "AugmentationMode","independent", ... "AugmentationParameterSource","specify", ... "ApplyTimeStretch",false, ... "ApplyPitchShift",false, ... "VolumeGain",2, ... "SNR",0, ... "TimeShift",2)
augmenter =
audioDataAugmenter with properties:
AugmentationMode: "independent"
AugmentationParameterSource: "specify"
ApplyTimeStretch: 0
ApplyPitchShift: 0
ApplyVolumeControl: 1
VolumeGain: 2
ApplyAddNoise: 1
SNR: 0
ApplyTimeShift: 1
TimeShift: 2
Call augment on the audio to create 3 augmentations.
data = augment(augmenter,audioIn,fs)
data=3×2 table
Audio AugmentationInfo
_________________ ________________
{685056×1 double} {1×1 struct}
{685056×1 double} {1×1 struct}
{685056×1 double} {1×1 struct}
You can check the parameter configuration of each augmentation using the AugmentatioInfo table variable.
augmentationToInspect =
3;
data.AugmentationInfo{augmentationToInspect}ans = struct with fields:
TimeShift: 2
Listen to the audio you are inspecting. Plot the time-domain representations of the original and augmented signals.
augmentation = data.Audio{augmentationToInspect};
sound(augmentation,fs)
t = (0:(numel(audioIn)-1))/fs;
taug = (0:(numel(augmentation)-1))/fs;
plot(t,audioIn,taug,augmentation)
legend("Original Audio","Augmented Audio")
ylabel("Amplitude")
xlabel("Time (s)")
The audioDataAugmenter supports multiple workflows for augmenting your datastore, including:
Offline augmentation
Augmentation using tall arrays
Augmentation using transform datastores
In each workflow, begin by creating an audio datastore to point to your audio data. In this example, you create an audio datastore that points to audio samples included with Audio Toolbox™. Count the number of files in the dataset.
folder = fullfile(matlabroot,"toolbox","audio","samples"); ADS = audioDatastore(folder)
ADS =
audioDatastore with properties:
Files: {
' ...\matlab\toolbox\audio\samples\Ambiance-16-44p1-mono-12secs.wav';
' ...\matlab\toolbox\audio\samples\AudioArray-16-16-4channels-20secs.wav';
' ...\toolbox\audio\samples\ChurchImpulseResponse-16-44p1-mono-5secs.wav'
... and 26 more
}
AlternateFileSystemRoots: {}
OutputDataType: 'double'
Labels: {}
numFilesInDataset = numel(ADS.Files)
numFilesInDataset = 29
Create an audioDataAugmenter that applies random sequential augmentations. Set NumAugmentations to 2.
aug = audioDataAugmenter('NumAugmentations',2)aug =
audioDataAugmenter with properties:
AugmentationMode: 'sequential'
AugmentationParameterSource: 'random'
NumAugmentations: 2
TimeStretchProbability: 0.5000
SpeedupFactorRange: [0.8000 1.2000]
PitchShiftProbability: 0.5000
SemitoneShiftRange: [-2 2]
VolumeControlProbability: 0.5000
VolumeGainRange: [-3 3]
AddNoiseProbability: 0.5000
SNRRange: [0 10]
TimeShiftProbability: 0.5000
TimeShiftRange: [-0.0050 0.0050]
Offline Augmentation
To augment the audio dataset, create two augmentations of each file and then write the augmentations as WAV files.
while hasdata(ADS) [audioIn,info] = read(ADS); data = augment(aug,audioIn,info.SampleRate); [~,fn] = fileparts(info.FileName); for i = 1:size(data,1) augmentedAudio = data.Audio{i}; % If augmentation caused an audio signal to have values outside of -1 and 1, % normalize the audio signal to avoid clipping when writing. if max(abs(augmentedAudio),[],'all')>1 augmentedAudio = augmentedAudio/max(abs(augmentedAudio),[],'all'); end audiowrite(sprintf('%s_aug%d.wav',fn,i),augmentedAudio,info.SampleRate) end end
Create an audioDatastore that points to the augmented dataset and confirm that the number of files in the dataset is double the original number of files.
augmentedADS = audioDatastore(pwd)
augmentedADS =
audioDatastore with properties:
Files: {
' ...\Examples\audio-ex28074079\Ambiance-16-44p1-mono-12secs_aug1.wav';
' ...\Examples\audio-ex28074079\Ambiance-16-44p1-mono-12secs_aug2.wav';
' ...\Examples\audio-ex28074079\AudioArray-16-16-4channels-20secs_aug1.wav'
... and 55 more
}
AlternateFileSystemRoots: {}
OutputDataType: 'double'
Labels: {}
numFilesInAugmentedDataset = numel(augmentedADS.Files)
numFilesInAugmentedDataset = 58
Augment Using Tall Arrays
When augmenting a dataset using tall arrays, the input data to the augmenter should be sampled at a consistent rate. Subset the original audio dataset to only include files with a sample rate of 44.1 kHz. Most datasets are already cleaned to have a consistent sample rate.
keepFile = cellfun(@(x)contains(x,'44p1'),ADS.Files);
ads44p1 = subset(ADS,keepFile);
fs = 44.1e3;Convert the audio datastore to a tall array. tall arrays are evaluated only when you request them explicitly using gather. MATLAB® automatically optimizes the queued calculations by minimizing the number of passes through the data. If you have the Parallel Computing Toolbox™, you can spread the calculations across multiple machines. The audio data is represented as an M-by-1 tall cell array, where M is the number of files in the audio datastore.
adsTall = tall(ads44p1)
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 6).
adsTall =
M×1 tall cell array
{ 539648×1 double}
{ 227497×1 double}
{ 8000×1 double}
{ 685056×1 double}
{ 882688×2 double}
{1115760×2 double}
{ 505200×2 double}
{3195904×2 double}
: :
: :
Define a cellfun function so that augmentation is applied to each cell of the tall array. Call gather to evaluate the tall array.
augTall = cellfun(@(x)augment(aug,x,fs),adsTall,"UniformOutput",false);
augmentedDataset = gather(augTall)Evaluating tall expression using the Parallel Pool 'local': - Pass 1 of 1: Completed in 1 min 34 sec Evaluation completed in 1 min 34 sec
augmentedDataset=12×1 cell array
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
{2×2 table}
The augmented dataset is returned as a numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numAugmentationsPerFile-by-2 table, where numAugmentationsPerFile is the number of augmentations returned per file.
numFiles = numel(augmentedDataset)
numFiles = 12
numAugmentationsPerFile = size(augmentedDataset{1},1)numAugmentationsPerFile = 2
Augment Using Transform Datastore
You can perform online data augmentation while you train your machine learning application using a transform datastore. Call transform to create a new datastore that applies data augmentation while reading.
transformADS = transform(ADS,@(x,info)augment(aug,x,info),'IncludeInfo',true)transformADS =
TransformedDatastore with properties:
UnderlyingDatastore: [1×1 audioDatastore]
Transforms: {@(x,info)augment(aug,x,info)}
IncludeInfo: 1
Call read to return the augmented first file from the transform datastore.
augmentedRead = read(transformADS)
augmentedRead=2×2 table
Audio AugmentationInfo
_________________ ________________
{539648×1 double} [1×1 struct]
{586683×1 double} [1×1 struct]
Input Arguments
audioDataAugmenter object.
Audio input, specified as a column vector or matrix of independent channels (columns).
Data Types: single | double
Sample rate in Hz, specified as a positive scalar. The allowable range of
fs depends on the properties of the audioDataAugmenter object.
Data Types: single | double
Output Arguments
Augmented audio and augmentation information, returned as a two-column table. The first column holds the augmented audio signal. The second column
holds information about the applied augmentation methods. The number of rows in
data corresponds to the number of output augmented signals. The
number of output augmented signals depends on the property values of the object.
Version History
Introduced in R2019b
See Also
audioDataAugmenter | addAugmentationMethod | removeAugmentationMethod | table
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Web サイトの選択
Web サイトを選択すると、翻訳されたコンテンツにアクセスし、地域のイベントやサービスを確認できます。現在の位置情報に基づき、次のサイトの選択を推奨します:
また、以下のリストから Web サイトを選択することもできます。
最適なサイトパフォーマンスの取得方法
中国のサイト (中国語または英語) を選択することで、最適なサイトパフォーマンスが得られます。その他の国の MathWorks のサイトは、お客様の地域からのアクセスが最適化されていません。
南北アメリカ
- América Latina (Español)
- Canada (English)
- United States (English)
ヨーロッパ
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)