フィルターのクリア

Creating a dataset for neural network training (Speech Recognition)

3 ビュー (過去 30 日間)
Anand Nayanar
Anand Nayanar 2014 年 4 月 21 日
回答済み: Foresight india 2017 年 8 月 11 日
Hi,
Firstly, I'd like to apologise for asking this question which I'm aware has been asked plenty of times before. However, despite having gone through the solutions to a number of resolved questions on the subject on this forum, I'm still unclear as to how to proceed with my particular problem which I will attempt to elaborate as well as I can in the following paragraph.
I am training a neural network to perform consonant recognition using MFCCs. Here are a few numbers that might come in handy to get an idea of my problem:
1) I have recorded voice samples from 16 people, and have 227 voice samples per person (So that's 3632 samples in all) . A different proportion of this number corresponds to different output classes (for example, 13 of these 227 samples correspond to the output class of consonant 'b', 12 samples correspond to consonant 'd' and 5 correspond to consonant 'q').
2) Each voice sample generates an MFCC matrix (feature vector) of dimensions 13x15 which I want to train the network with. This means that every voice sample is divided into 15 frames, each of which has 13 MFCC values.
3) There are 20 output classes (for 20 different basic consonant sounds).
My questions are :
1) How do I format the input and target matrices? I have a feeling it may be of dimensions 195x3632 (13*15=195), where each column corresponds to each voice sample, and every column contains 13 MFCC values for each of the 15 frames per sample i.e. 195 values in all. The target dataset is probably a matrix of dimensions 20x3632 (where every column has 19 'zeroes' and 1 'one' indicating which consonant that particular voice sample corresponds to)
Could you confirm if this correct?
2) If this correct, during the testing of the network with unknown inputs, would the inputs have to be arranged in a matrix of dimensions 195x1? (i.e. 13 MFCC values for each of the 15 frames per input voice sample)
3) If my premise is correct, could you tell me how to create a dataset file? Do I even have to create a separate dataset .m file if I simply save my input and target matrices to the workspace and simply load them in the NN toolbox GUI?
4) How would I separate the training data into training, validation and test data? do I simply load the input and target matrices into the GUI, select 'matrix columns' as samples and specify the appropriate percentages?
5) Lastly, would a 195x3632 matrix be too large a training set or would I have to trim it down?
I have tried to explain my problem in as much detail as possible. I'd greatly appreciate an answer that is specific to my question rather than a generic answer on how to create input and target matrices as has been presented as solutions to previous questions, as I had trouble understanding them.
Once again, I apologise for asking a question that has been asked before.
Cheers,
Anand

採用された回答

Greg Heath
Greg Heath 2014 年 4 月 23 日
It looks like you know how to do it. So ... go do it.
Yes you might have to use input dimensionality reduction. However, try without it first.
  3 件のコメント
kamalvir
kamalvir 2015 年 1 月 28 日
my mfcc feature values are both positive and negative and vary over a wide range. i have heard some preprocessing is required for inputting to a neural network to make values vary fron 0 to 1. how to do it??please reply . i need this desperately.
Greg Heath
Greg Heath 2015 年 1 月 28 日
Preprocessing is a default.

サインインしてコメントする。

その他の回答 (1 件)

Foresight india
Foresight india 2017 年 8 月 11 日
We are starting to develop Automatic voice response system| neural technology
-The challenge here is to replicate human voice to offer better communication and for this we have to use multi layered neural network application to handle- AVR- Automatic voice response We are inviting people already working in this domain, to review or share your suggestion.

カテゴリ

Help Center および File ExchangeSequence and Numeric Feature Data Workflows についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by