Speech recognition (Problem reducing isolated word recognition)

Question

Chan 2011 年 10 月 30 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/19805-speech-recognition-problem-reducing-isolated-word-recognition

Hi, i'm currently doing a project that recognize isolated word which is one,two,three,four,five,six,seven,eight,nine and zero. I download a sample from mathwork and the sample work fine for me although some recognition not that accurate. But the problem occur when i tried to reduce the isolated word to one,two,three,four,five only. I removed everything relevant to six,seven,eight,nine,zero but i received some error code. Can anyone able to tell me what's wrong with the code and error code?

The code is display below

function  trainmodels(speech,model)
% This accepts input training speech for each digit, and estimates a
% Gaussian Mixture Model from the training vectors for each digit.
%
% Accepts input speech of repeated utterances of a single digit.  Input
% speech must be sampled at 8000 Hz.  For each frame of 160 samples (with 
% 80 sample overlap) this function then detects isolated digit utterances 
% using the algorithm described in 'speechdetect.m'.  For each detected
% word, this function then ...
%
% 1) Windows each frame of speech with a Hamming window
% 2) Applys a pre-emphasis filter to each frame
% 3) Calculates 13 MFCC, 13 delta MFCC, and 13 delta-delta MFCC 
%    coefficients for each frame
% 4) Estimates an 8 mixture Gaussian Mixture Model for the 13 dimensional
%    training vectors (with diagonal covariance matrix).
% 5) The GMM parameters for each digit are saved to a structure called
%    'model' within a file called 'MODELS.mat'.  This .mat file is loaded
%    from within 'digitrecgui.m' to perform the classification.
Fs = 8000;                          % Sampling Frequency
seglength = 160;                    % Length of frames
overlap = seglength/2;              % # of samples to overlap
stepsize = seglength - overlap;     % Frame stepsize
nframes = length(speech)/stepsize-1;
std_energy = 0.75;           % Energy STD gain factor for Voice Activity (VA)
std_zxings = 0.75;           % Zero xing STD gain factor for VA
noiseframes = 50;           % # of frames used to estimate background noise
bufferlength = 10;          % Min # of non-VA frames to signify a break in 
                            % speech (silence between words)
% Initialise Variables
samp1 = 1; samp2 = seglength;           %Initialise frame start and end
energy_thresh_buf = zeros(noiseframes,1);
zxings_thresh_buf = zeros(noiseframes,1);
VAbuff = zeros(bufferlength,1);
VA = 0;             % "Voice Activity" flag
DETECT = 0;         % "VA indicator" flag
WORD = 0;           % "Word has been detected" flag
WORDbuff = zeros(seglength,200);
ALLdata = [];
for i = 1:nframes
    % Remove mean from analysis frame
    frame = speech(samp1:samp2)-mean(speech(samp1:samp2));
      % Calculate energy and zero xings in current frame
      % These are used as voice activity indicators
      frame_energy = log(sum(frame.*frame)+eps);
      frame_zxings = zerocross;
      % Simple estimation of low energy threshold and zero crossings 
      % threshold. (Assumes no speech activity for the first 'noiseframes' 
      % overlapped frames)
      if i < noiseframes
          energy_thresh_buf(i) = frame_energy;
          zxings_thresh_buf(i) = frame_zxings;
      elseif i == noiseframes
          energy_thresh = mean(energy_thresh_buf) + ...
              std_energy*std(energy_thresh_buf);
          % Requires a minimum threshold of 25 zero crossings
          xing_thresh = max(mean(zxings_thresh_buf) + ...
              std_zxings*std(zxings_thresh_buf),25);
      else
          % Initial indicator of Voice Activity
          if frame_energy >= energy_thresh || frame_zxings >= xing_thresh
              DETECT = 1;
          else
              DETECT = 0;
          end
          % Now need to decide if we really do have voice activity based on
          % the length of time that "DETECT" == 1.   
          if VA   % We may have voice activity
              VAframes = VAframes + 1;        % Increment VAframe counter
              WORDbuff(:,VAframes) = frame;   % Save in buffer
              % Circular shift buffer and save current frame indicator
              VAbuff = circshift(VAbuff,1); 
              if DETECT 
                  VAbuff(1) = 1;
              else
                  VAbuff(1) = 0; 
              end
              % Look at buffer of frames where DETECT = 1
              if VAbuff(1) 
                  % Reset buffer 
                  VAbuff = [1; zeros(bufferlength-1,1)];
              elseif VAbuff(end)
                  % There was no voice activity for duration of buffer.  Turn
                  % off VA flag and calculate number of contiguous frames 
                  % with voice activity.     
                  VA = 0;
                  VAframes = VAframes - bufferlength - 1;
                  % Disregard any contiguous frames less than 0.25s
                  if VAframes > 25; 
                      WORD = 1;
                      WORDdata = WORDbuff(:,1:VAframes);
                      WORDbuff = zeros(seglength,200); 
                  else
                      WORD = 0;
                      WORDbuff = zeros(seglength,200);
                  end
              end                                         
          else    % No voice detected yet
              % Do indicators suggest VA?
              if DETECT
                  VA = 1;                 % Set flag to say we may have VA
                  VAframes = 1;           % Re-Initialise VA frame number
                  WORDbuff(:,1) = frame;  % Save in buffer
                  % Initialise buffer to record the previous frames where
                  % DETECT = 1.  This is used to determine contiguous frames
                  % of voice or non-voice activity
                  VAbuff = [1; zeros(bufferlength-1,1)];
              end            
          end
          % Combine all speech frames in one big matrix
          if WORD
              ALLdata = [ALLdata WORDdata];
              WORD = 0;
          end
      end
      % Step up to next frame of speech
      samp1 = samp1 + stepsize;
      samp2 = samp2 + stepsize;
end
    %Nested function for zero crossing calculation
    function numcross = zerocross
        currsum = 0;
        prevsign = 0;
          for kk = 1:seglength
              currsign = sign(frame(kk));
              if (currsign * prevsign) == -1
                  currsum = currsum + 1;
              end
              if currsign ~= 0
                  prevsign = currsign;
              end
          end
          numcross = currsum;
      end
% Calculate MFCC coefficients from overlapped speech frames
mfccdata = mfcc(ALLdata,Fs,1);
% Calculate a GMM fit for the training data and save to MODELS
if exist('MODELS.mat','file')
    load MODELS
end
modelidx = getmodelidx;
models(modelidx).word = model;
options = statset('MaxIter',500,'Display','final');
disp(['Starting GMM Training for: ' model]);
models(modelidx).gmm = gmdistribution.fit(mfccdata',8,'CovType',...
    'diagonal','Options',options);
save MODELS models
      % Nested function to get index for model structure
      function idx = getmodelidx
          switch model
              case 'one', idx = 1;
              case 'two', idx = 2;
              case 'three', idx = 3;
              case 'four', idx = 4;
              case 'five', idx = 5;
              case 'six', idx = 6;
              case 'seven', idx = 7;
              case 'eight', idx = 8;
              case 'nine', idx = 9;
              case 'zero', idx = 10;
              otherwise, error('Invalid Word for training');
          end
      end
% Also save training vectors to TRAINDATA (maybe for future use)
%if exist('TRAINDATA.mat','file')
%    load TRAINDATA
%end
%traindata.(model).trainvectors=mfccdata;
%save TRAINDATA traindata
end

The error code is as below when i try to remove case six until case zero.

Error in ==> digitrecgui>getword at 313 switch nll_IDX

??? Output argument "word" (and maybe others) not assigned during call to "C:\Users\ck\Documents\Academic\FYP\MATLAB\Sample\Isolated_Digit_Recognition\digitrecgui.m (getword)".

Error in ==> digitrecgui>startbutton_Callback at 249 digit = getword(nll_IDX);

Error in ==> gui_mainfcn at 96 feval(varargin{:});

Error in ==> digitrecgui at 43 gui_mainfcn(gui_State, varargin{:});

Error in ==> guidemfile>@(hObject,eventdata)digitrecgui('startbutton_Callback',hObject,eventdata,guidata(hObject))

??? Error while evaluating uicontrol Callback

Anyone got idea what is going on?

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

misc llaenous 2012 年 3 月 16 日

does this code compare speech signals (0 to 9)? if so, how did you feed in the speech signals in matlab that user can compare with?

Daniel Shub 2012 年 3 月 16 日

@misc I suggest that you have a look at

http://www.mathworks.com/matlabcentral/answers/8626-how-do-i-get-help-on-homework-questions-on-matlab-answers

http://www.mathworks.com/matlabcentral/answers/728-how-do-i-write-a-good-question-for-matlab-answers

http://www.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer

Instead of digging up an unanswered (and unresponded to) question from 4 months ago.

サインインしてコメントする。

サインインしてこの質問に回答する。

Speech recognition (Problem reducing isolated word recognition)

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示

回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

Speech recognition (Problem reducing isolated word recognition)

4 件のコメント 2 件の古いコメントを表示2 件の古いコメントを非表示

回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

4 件のコメント
2 件の古いコメントを表示2 件の古いコメントを非表示