オーディオ処理

オーディオ処理および音声処理アプリケーションにおける深層学習のワークフローの拡張

Deep Learning Toolbox™ を Audio Toolbox™ と共に使用して、オーディオ処理および音声処理アプリケーションに深層学習を適用します。信号処理アプリケーションについては、信号処理を参照してください。無線通信アプリケーションについては、無線通信を参照してください。

アプリ

信号ラベラー

対象となる信号の属性、領域および点へのラベル付けまたは特徴の抽出

関数

すべて展開する

データの管理と拡張

`audioDatastore`	Datastore for collection of audio files
`audioDataAugmenter`	Augment audio data (R2019b 以降)

特徴抽出

`audioFeatureExtractor`	Streamline audio feature extraction (R2019b 以降)
`openl3Embeddings`	Extract OpenL3 feature embeddings (R2022a 以降)
`pitchnn`	Estimate pitch with deep learning neural network (R2021a 以降)
`vggishEmbeddings`	Extract VGGish feature embeddings (R2022a 以降)

事前学習済みのネットワーク

`yamnet`	(Not recommended) YAMNet neural network (R2020b 以降)
`classifySound`	Classify sounds in audio signal (R2020b 以降)
`crepe`	(Not recommended) CREPE neural network (R2021a 以降)
`pitchnn`	Estimate pitch with deep learning neural network (R2021a 以降)
`vggish`	(Not recommended) VGGish neural network (R2020b 以降)
`vggishEmbeddings`	Extract VGGish feature embeddings (R2022a 以降)
`openl3`	(Not recommended) OpenL3 neural network (R2021a 以降)
`openl3Embeddings`	Extract OpenL3 feature embeddings (R2022a 以降)
`vadnet`	(Not recommended) Voice activity detection (VAD) neural network (R2023a 以降)
`detectspeechnn`	Detect boundaries of speech in audio signal using AI (R2023a 以降)
`separateSpeakers`	Separate signal by speakers (R2023b 以降)

ブロック

すべて展開する

VGGish

VGGish	VGGish embeddings extraction network (R2022a 以降)
VGGish Embeddings	Extract VGGish embeddings (R2022a 以降)

YAMNet

YAMNet	YAMNet sound classification network (R2021b 以降)
Sound Classifier	Classify sounds in audio signal (R2021b 以降)

OpenL3

OpenL3	OpenL3 embeddings extraction network (R2022b 以降)
OpenL3 Embeddings	Extract OpenL3 embeddings (R2022b 以降)

CREPE

CREPE	CREPE deep pitch estimation neural network (R2023a 以降)
Deep Pitch Estimator	Estimate pitch with CREPE deep learning neural network (R2023a 以降)

トピック

Deep Learning for Audio Applications (Audio Toolbox)
Learn common tools and workflows to apply deep learning to audio applications.
Classify Sound Using Deep Learning (Audio Toolbox)
Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.
Adapt Pretrained Audio Network for New Data Using Deep Network Designer
This example shows how to interactively adapt a pretrained network to classify new audio signals using Deep Network Designer.
Audio Transfer Learning Using Experiment Manager
Configure an experiment that compares the performance of multiple pretrained networks applied to a speech command recognition task using transfer learning.
Compare Speaker Separation Models
Compare the performance, size, and speed of multiple deep learning speaker separation models.
Speaker Identification Using Custom SincNet Layer and Deep Learning
Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.
Dereverberate Speech Using Deep Learning Networks
Train a deep learning model that removes reverberation from speech.
Speech Command Recognition in Simulink
Detect the presence of speech commands in audio using a Simulink^® model.
オーディオの特徴に関する逐次特徴選択
この例では、数字の音声認識タスクに適用される特徴選択の標準的なワークフローを説明します。
Train Spoken Digit Recognition Network Using Out-of-Memory Audio Data
This example trains a spoken digit recognition network on out-of-memory audio data using a transformed datastore. In this example, you apply a random pitch shift to audio data used to train a convolutional neural network (CNN). For each training iteration, the audio data is augmented using the audioDataAugmenter (Audio Toolbox) object and then features are extracted using the audioFeatureExtractor (Audio Toolbox) object. The workflow in this example applies to any random data augmentation used in a training loop. The workflow also applies when the underlying audio data set or training features do not fit in memory.
Train Spoken Digit Recognition Network Using Out-of-Memory Features
This example trains a spoken digit recognition network on out-of-memory auditory spectrograms using a transformed datastore. In this example, you extract auditory spectrograms from audio using audioDatastore (Audio Toolbox) and audioFeatureExtractor (Audio Toolbox), and you write them to disk. You then use a signalDatastore (Signal Processing Toolbox) to access the features during training. The workflow is useful when the training features do not fit in memory. In this workflow, you only extract features once, which speeds up your workflow if you are iterating on the deep learning model design.
Investigate Audio Classifications Using Deep Learning Interpretability Techniques
This example shows how to use interpretability techniques to investigate the predictions of a deep neural network trained to classify audio data.
Accelerate Audio Deep Learning Using GPU-Based Feature Extraction
Leverage GPUs for feature extraction to decrease the time required to train an audio deep learning model.

注目の例

Compress Machine Fault Recognition Neural Network Using Projection

Compress a pretrained acoustics-based machine fault recognition neural network using projection and principal component analysis.

オーディオ処理

アプリ

関数

データの管理と拡張

特徴抽出

事前学習済みのネットワーク

ブロック

VGGish

YAMNet

OpenL3

CREPE

トピック

関連情報

注目の例

Compress Machine Fault Recognition Neural Network Using Projection

Audio-Based Anomaly Detection for Machine Health Monitoring

3-D Speech Enhancement Using Trained Filter and Sum Network

3-D Sound Event Localization and Detection Using Trained Recurrent Convolutional Neural Network

Speaker Recognition Using x-vectors

Speaker Diarization Using x-vectors

深層学習を使用した音声コマンド認識モデルの学習

MFCC ネットワークと LSTM ネットワークを使用したノイズ内のキーワード スポッティング

深層学習ネットワークを使用した音声のノイズ除去

音声合成の敵対的生成ネットワーク (GAN) の学習

深層学習を使用したノイズに含まれる音声区間の検出

音声感情認識

後期融合を使用した音響シーン認識

End-to-End Deep Speaker Separation

Acoustics-Based Machine Fault Recognition

Audio Event Classification Using TensorFlow Lite on Raspberry Pi

Keyword Spotting in Noise Code Generation on Raspberry Pi

Intel MKL-DNN による音声コマンド認識コードの生成

Acoustics-Based Machine Fault Recognition Code Generation

Speech Command Recognition on Raspberry Pi Using Simulink

MFCC ネットワークと LSTM ネットワークを使用したノイズ内のキーワードスポッティング