最新のリリースでは、このページがまだ翻訳されていません。 このページの最新版は英語でご覧になれます。

深層学習を使用したオーディオ処理

オーディオ処理および音声処理アプリケーションによる深層学習のワークフローの拡張

Deep Learning Toolbox™ を Audio Toolbox™ と共に使用して、オーディオ処理および音声処理アプリケーションに深層学習を適用します。

アプリ

Audio LabelerDefine and visualize ground-truth labels

関数

audioDatastoreDatastore for collection of audio files

トピック

Introduction to Deep Learning for Audio Applications (Audio Toolbox)

Learn common tools and workflows to apply deep learning to audio applications.

Classify Sound Using Deep Learning (Audio Toolbox)

Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.

深層学習を使用した音声コマンド認識

この例では、オーディオに存在する音声コマンドを検出するシンプルな深層学習モデルに学習させる方法を説明します。

深層学習ネットワークを使用した音声のノイズ除去

この例では、深層学習ネットワークを使用して音声信号をノイズ除去する方法を説明します。

Classify Gender Using LSTM Networks

This example shows how to classify the gender of a speaker using deep learning. The example uses a Bidirectional Long Short-Term Memory (BiLSTM) network and Gammatone Cepstral Coefficients (gtcc), pitch, harmonic ratio, and several spectral shape descriptors.

Voice Activity Detection in Noise Using Deep Learning

This example shows how to detect regions of speech in a low signal-to-noise environment using deep learning. The example uses the Speech Commands Dataset to train a Bidirectional Long Short-Term Memory (BiLSTM) network to detect voice activity.

Spoken Digit Recognition with Wavelet Scattering and Deep Learning

This example shows how to classify spoken digits using both machine and deep learning techniques. In the example, you perform classification using wavelet time scattering with a support vector machine (SVM) and with a long short-term memory (LSTM) network. You also apply Bayesian optimization to determine suitable hyperparameters to improve the accuracy of the LSTM network. In addition, the example illustrates an approach using a deep convolutional neural network (CNN) and mel-frequency spectrograms.

Cocktail Party Source Separation Using Deep Learning Networks

This example shows how to isolate a speech signal using a deep learning network.

Speech Emotion Recognition

This example illustrates a simple speech emotion recognition (SER) system using a BiLSTM network. You begin by downloading the data set and then testing the trained network on individual files. The network was trained on a small German-language database .

Keyword Spotting in Noise Using MFCC and LSTM Networks

This example shows how to identify a keyword in noisy speech using a deep learning network. In particular, the example uses a Bidirectional Long Short-Term Memory (BiLSTM) network and mel-frequency cepstral coefficients (MFCC).

Acoustic Scene Recognition Using Late Fusion

This example shows how to create a multi-model late fusion system for acoustic scene recognition. The example trains a convolutional neural network (CNN) using mel spectrograms and an ensemble classifier using wavelet scattering. The example uses the TUT dataset for training and evaluation [1].

注目の例