Implementing speech emotion recognition using CNN

Question

Hamza 2023 年 10 月 12 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2032434-implementing-speech-emotion-recognition-using-cnn

回答済み: young 2024 年 4 月 8 日

Hello everyone, I'm working on speech emotion recognition. I have a training matrix of size 60,000 x 39 and a label matrix of size 60,000 x 1. I would like to implement a Convolutional Neural Network (CNN). I've tried the code below, but it didn't work well. I believe I may have missed something. Could anyone please help me?

%% CNN
layers = [
    imageInputLayer([39 1 1]) 
    convolution2dLayer(3,16,'Padding','same')
    reluLayer
    fullyConnectedLayer(384) % 384 refers to number of neurons in next FC hidden layer
    fullyConnectedLayer(384) % 384 refers to number of neurons in next FC hidden layer
    fullyConnectedLayer(7) %  refers to number of neurons in next output layer (number of output classes)
    softmaxLayer
    classificationLayer];
options = trainingOptions('sgdm',...
    'MaxEpochs',500, ...
    'Verbose',false,...
    'Plots','training-progress');
XTrain=reshape(mfcc_matrix_app, [39,1,1,60575]);
test=reshape(mfcc_matrix_app, [39,1,1,30000]);
targetD=categorical(CA);
net = trainNetwork(XTrain,targetD,layers,options);
predictedLabels = classify(net,test);

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Neha 2023 年 10 月 18 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2032434-implementing-speech-emotion-recognition-using-cnn#answer_1335689

Hi Hamza,

I understand that you want to implement speech emotion recognition using CNN. Here are a few suggestions to improve the performance of the model:

As a part of audio pre-processing, you can implement feature extraction and audio augmentation techniques before passing the data to the CNN model.

You can first extract features like ZCR (Zero Crossing Rate), MFCC (Mel-Frequency Cepstral Coefficients) and Mel Spectrum using the "audioFeatureExtractor" function. You can refer to the following documentation link for more information on the function:

https://www.mathworks.com/help/audio/ref/audiofeatureextractor.html

Followed by feature extraction, you can augment the audio by adding noise, stretching and shifting the pitch using the "audioDataAugmenter" function. You can refer to the following documentation link for more information on the function:

https://www.mathworks.com/help/audio/ref/audiodataaugmenter.html

You can also encode the data labels into one-hot vectors using the "onehotencode" function:

https://www.mathworks.com/help/deeplearning/ref/onehotencode.html

In the network architecture, you can include batch normalization layers, dropout layers and max pooling layers as well for faster convergence and regularization. You can refer to the following code snippet for reference:

https://www.mathworks.com/help/deeplearning/ug/deep-learning-speech-recognition.html

Also, your maximum number of epochs is set to 500, which might be too high and could lead to overfitting. You might want to consider using early stopping to prevent this. You can introduce "ValidationData", "ValidationFrequency" and "ValidationPatience" training options for validation stopping. You can refer to the following documentation link for more information on specifying training options:

https://www.mathworks.com/help/deeplearning/ref/trainingoptions.html

Hope this helps!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Answer 2

young 2024 年 4 月 8 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2032434-implementing-speech-emotion-recognition-using-cnn#answer_1438266

MATLAB Online で開く

audio_features.mat

Hi Hamza,

May I ask if you have solved this problem. I am doing a similar SER project. I extract features like MFCC, Mel, Pitch and Intensity. Then, I got a traing matrix of size 936x1 cell (200x1 double in every cell) and a label matrix of size 1x936 categorical.

The .mat is attached

My 1dCNN code is like below.

load('audio_features.mat', 'X_train', 'X_test', 'y_train', 'y_test');
x_traincnn = num2cell(X_train, 2);
y_traincnn = categorical(y_train.'); 
x_testcnn = num2cell(X_test, 2);
y_testcnn = categorical(y_test.'); 
x_traincnn = cellfun(@(x) x', x_traincnn, 'UniformOutput', false);
x_testcnn = cellfun(@(x) x', x_testcnn, 'UniformOutput', false);
disp(size(x_traincnn));  
disp(size(x_testcnn));   
disp(size(y_traincnn));  
disp(size(y_testcnn));   
numFeatures = 200;
numClasses = numel(categories(y_train));
filterSize = 5;
numFilters = 32;
rng('default');
layers = [ ...
    sequenceInputLayer(numFeatures)
    convolution1dLayer(filterSize,numFilters,Padding="causal")
    reluLayer
    layerNormalizationLayer
    convolution1dLayer(filterSize,2*numFilters,Padding="causal")
    reluLayer
    layerNormalizationLayer
    globalAveragePooling1dLayer
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];
miniBatchSize = 27;
options = trainingOptions("adam", ...
    MaxEpochs=200, ...
    InitialLearnRate=0.01, ...
    SequencePaddingDirection="left", ...
    ValidationData={x_testcnn,y_testcnn}, ...
    Plots="training-progress", ...
    Verbose=0);
net = trainNetwork(x_traincnn, y_traincnn, layers, options);
YPred = classify(net,x_testcnn, ...
    SequencePaddingDirection="left");
acc = mean(YPred == y_testcnn);
disp(["Accuracy: ", acc]);
confMat = confusionmat(y_testcnn, YPred);
disp(confMat);
figure;
confusionchart(y_testcnn,YPred);

The output is :

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Implementing speech emotion recognition using CNN

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Implementing speech emotion recognition using CNN

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示