Creating a dataset for neural network training (Speech Recognition)

Question

Anand Nayanar 2014 年 4 月 21 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/126543-creating-a-dataset-for-neural-network-training-speech-recognition

回答済み: Foresight india 2017 年 8 月 11 日

Hi,

Firstly, I'd like to apologise for asking this question which I'm aware has been asked plenty of times before. However, despite having gone through the solutions to a number of resolved questions on the subject on this forum, I'm still unclear as to how to proceed with my particular problem which I will attempt to elaborate as well as I can in the following paragraph.

I am training a neural network to perform consonant recognition using MFCCs. Here are a few numbers that might come in handy to get an idea of my problem:

1) I have recorded voice samples from 16 people, and have 227 voice samples per person (So that's 3632 samples in all) . A different proportion of this number corresponds to different output classes (for example, 13 of these 227 samples correspond to the output class of consonant 'b', 12 samples correspond to consonant 'd' and 5 correspond to consonant 'q').

2) Each voice sample generates an MFCC matrix (feature vector) of dimensions 13x15 which I want to train the network with. This means that every voice sample is divided into 15 frames, each of which has 13 MFCC values.

3) There are 20 output classes (for 20 different basic consonant sounds).

My questions are :

1) How do I format the input and target matrices? I have a feeling it may be of dimensions 195x3632 (13*15=195), where each column corresponds to each voice sample, and every column contains 13 MFCC values for each of the 15 frames per sample i.e. 195 values in all. The target dataset is probably a matrix of dimensions 20x3632 (where every column has 19 'zeroes' and 1 'one' indicating which consonant that particular voice sample corresponds to)

Could you confirm if this correct?

2) If this correct, during the testing of the network with unknown inputs, would the inputs have to be arranged in a matrix of dimensions 195x1? (i.e. 13 MFCC values for each of the 15 frames per input voice sample)

3) If my premise is correct, could you tell me how to create a dataset file? Do I even have to create a separate dataset .m file if I simply save my input and target matrices to the workspace and simply load them in the NN toolbox GUI?

4) How would I separate the training data into training, validation and test data? do I simply load the input and target matrices into the GUI, select 'matrix columns' as samples and specify the appropriate percentages?

5) Lastly, would a 195x3632 matrix be too large a training set or would I have to trim it down?

I have tried to explain my problem in as much detail as possible. I'd greatly appreciate an answer that is specific to my question rather than a generic answer on how to create input and target matrices as has been presented as solutions to previous questions, as I had trouble understanding them.

Once again, I apologise for asking a question that has been asked before.

Cheers,

Anand

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Greg Heath 2014 年 4 月 23 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/126543-creating-a-dataset-for-neural-network-training-speech-recognition#answer_134256

It looks like you know how to do it. So ... go do it.

Yes you might have to use input dimensionality reduction. However, try without it first.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

kamalvir 2015 年 1 月 28 日

my mfcc feature values are both positive and negative and vary over a wide range. i have heard some preprocessing is required for inputting to a neural network to make values vary fron 0 to 1. how to do it??please reply . i need this desperately.

Greg Heath 2015 年 1 月 28 日

Preprocessing is a default.

サインインしてコメントする。

Answer 2

Foresight india 2017 年 8 月 11 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/126543-creating-a-dataset-for-neural-network-training-speech-recognition#answer_277649

We are starting to develop Automatic voice response system| neural technology

-The challenge here is to replicate human voice to offer better communication and for this we have to use multi layered neural network application to handle- AVR- Automatic voice response We are inviting people already working in this domain, to review or share your suggestion.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Creating a dataset for neural network training (Speech Recognition)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Creating a dataset for neural network training (Speech Recognition)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

その他の回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示