Speech recognition (Problem reducing isolated word recognition)

4 ビュー (過去 30 日間)
Chan
Chan 2011 年 10 月 30 日
Hi, i'm currently doing a project that recognize isolated word which is one,two,three,four,five,six,seven,eight,nine and zero. I download a sample from mathwork and the sample work fine for me although some recognition not that accurate. But the problem occur when i tried to reduce the isolated word to one,two,three,four,five only. I removed everything relevant to six,seven,eight,nine,zero but i received some error code. Can anyone able to tell me what's wrong with the code and error code?
The code is display below
function trainmodels(speech,model)
% This accepts input training speech for each digit, and estimates a
% Gaussian Mixture Model from the training vectors for each digit.
%
% Accepts input speech of repeated utterances of a single digit. Input
% speech must be sampled at 8000 Hz. For each frame of 160 samples (with
% 80 sample overlap) this function then detects isolated digit utterances
% using the algorithm described in 'speechdetect.m'. For each detected
% word, this function then ...
%
% 1) Windows each frame of speech with a Hamming window
% 2) Applys a pre-emphasis filter to each frame
% 3) Calculates 13 MFCC, 13 delta MFCC, and 13 delta-delta MFCC
% coefficients for each frame
% 4) Estimates an 8 mixture Gaussian Mixture Model for the 13 dimensional
% training vectors (with diagonal covariance matrix).
% 5) The GMM parameters for each digit are saved to a structure called
% 'model' within a file called 'MODELS.mat'. This .mat file is loaded
% from within 'digitrecgui.m' to perform the classification.
Fs = 8000; % Sampling Frequency
seglength = 160; % Length of frames
overlap = seglength/2; % # of samples to overlap
stepsize = seglength - overlap; % Frame stepsize
nframes = length(speech)/stepsize-1;
std_energy = 0.75; % Energy STD gain factor for Voice Activity (VA)
std_zxings = 0.75; % Zero xing STD gain factor for VA
noiseframes = 50; % # of frames used to estimate background noise
bufferlength = 10; % Min # of non-VA frames to signify a break in
% speech (silence between words)
% Initialise Variables
samp1 = 1; samp2 = seglength; %Initialise frame start and end
energy_thresh_buf = zeros(noiseframes,1);
zxings_thresh_buf = zeros(noiseframes,1);
VAbuff = zeros(bufferlength,1);
VA = 0; % "Voice Activity" flag
DETECT = 0; % "VA indicator" flag
WORD = 0; % "Word has been detected" flag
WORDbuff = zeros(seglength,200);
ALLdata = [];
for i = 1:nframes
% Remove mean from analysis frame
frame = speech(samp1:samp2)-mean(speech(samp1:samp2));
% Calculate energy and zero xings in current frame
% These are used as voice activity indicators
frame_energy = log(sum(frame.*frame)+eps);
frame_zxings = zerocross;
% Simple estimation of low energy threshold and zero crossings
% threshold. (Assumes no speech activity for the first 'noiseframes'
% overlapped frames)
if i < noiseframes
energy_thresh_buf(i) = frame_energy;
zxings_thresh_buf(i) = frame_zxings;
elseif i == noiseframes
energy_thresh = mean(energy_thresh_buf) + ...
std_energy*std(energy_thresh_buf);
% Requires a minimum threshold of 25 zero crossings
xing_thresh = max(mean(zxings_thresh_buf) + ...
std_zxings*std(zxings_thresh_buf),25);
else
% Initial indicator of Voice Activity
if frame_energy >= energy_thresh || frame_zxings >= xing_thresh
DETECT = 1;
else
DETECT = 0;
end
% Now need to decide if we really do have voice activity based on
% the length of time that "DETECT" == 1.
if VA % We may have voice activity
VAframes = VAframes + 1; % Increment VAframe counter
WORDbuff(:,VAframes) = frame; % Save in buffer
% Circular shift buffer and save current frame indicator
VAbuff = circshift(VAbuff,1);
if DETECT
VAbuff(1) = 1;
else
VAbuff(1) = 0;
end
% Look at buffer of frames where DETECT = 1
if VAbuff(1)
% Reset buffer
VAbuff = [1; zeros(bufferlength-1,1)];
elseif VAbuff(end)
% There was no voice activity for duration of buffer. Turn
% off VA flag and calculate number of contiguous frames
% with voice activity.
VA = 0;
VAframes = VAframes - bufferlength - 1;
% Disregard any contiguous frames less than 0.25s
if VAframes > 25;
WORD = 1;
WORDdata = WORDbuff(:,1:VAframes);
WORDbuff = zeros(seglength,200);
else
WORD = 0;
WORDbuff = zeros(seglength,200);
end
end
else % No voice detected yet
% Do indicators suggest VA?
if DETECT
VA = 1; % Set flag to say we may have VA
VAframes = 1; % Re-Initialise VA frame number
WORDbuff(:,1) = frame; % Save in buffer
% Initialise buffer to record the previous frames where
% DETECT = 1. This is used to determine contiguous frames
% of voice or non-voice activity
VAbuff = [1; zeros(bufferlength-1,1)];
end
end
% Combine all speech frames in one big matrix
if WORD
ALLdata = [ALLdata WORDdata];
WORD = 0;
end
end
% Step up to next frame of speech
samp1 = samp1 + stepsize;
samp2 = samp2 + stepsize;
end
%Nested function for zero crossing calculation
function numcross = zerocross
currsum = 0;
prevsign = 0;
for kk = 1:seglength
currsign = sign(frame(kk));
if (currsign * prevsign) == -1
currsum = currsum + 1;
end
if currsign ~= 0
prevsign = currsign;
end
end
numcross = currsum;
end
% Calculate MFCC coefficients from overlapped speech frames
mfccdata = mfcc(ALLdata,Fs,1);
% Calculate a GMM fit for the training data and save to MODELS
if exist('MODELS.mat','file')
load MODELS
end
modelidx = getmodelidx;
models(modelidx).word = model;
options = statset('MaxIter',500,'Display','final');
disp(['Starting GMM Training for: ' model]);
models(modelidx).gmm = gmdistribution.fit(mfccdata',8,'CovType',...
'diagonal','Options',options);
save MODELS models
% Nested function to get index for model structure
function idx = getmodelidx
switch model
case 'one', idx = 1;
case 'two', idx = 2;
case 'three', idx = 3;
case 'four', idx = 4;
case 'five', idx = 5;
case 'six', idx = 6;
case 'seven', idx = 7;
case 'eight', idx = 8;
case 'nine', idx = 9;
case 'zero', idx = 10;
otherwise, error('Invalid Word for training');
end
end
% Also save training vectors to TRAINDATA (maybe for future use)
%if exist('TRAINDATA.mat','file')
% load TRAINDATA
%end
%traindata.(model).trainvectors=mfccdata;
%save TRAINDATA traindata
end
The error code is as below when i try to remove case six until case zero.
Error in ==> digitrecgui>getword at 313 switch nll_IDX
??? Output argument "word" (and maybe others) not assigned during call to "C:\Users\ck\Documents\Academic\FYP\MATLAB\Sample\Isolated_Digit_Recognition\digitrecgui.m (getword)".
Error in ==> digitrecgui>startbutton_Callback at 249 digit = getword(nll_IDX);
Error in ==> gui_mainfcn at 96 feval(varargin{:});
Error in ==> digitrecgui at 43 gui_mainfcn(gui_State, varargin{:});
Error in ==> guidemfile>@(hObject,eventdata)digitrecgui('startbutton_Callback',hObject,eventdata,guidata(hObject))
??? Error while evaluating uicontrol Callback
Anyone got idea what is going on?
  4 件のコメント
misc llaenous
misc llaenous 2012 年 3 月 16 日
does this code compare speech signals (0 to 9)? if so, how did you feed in the speech signals in matlab that user can compare with?
Daniel Shub
Daniel Shub 2012 年 3 月 16 日
@misc I suggest that you have a look at
http://www.mathworks.com/matlabcentral/answers/8626-how-do-i-get-help-on-homework-questions-on-matlab-answers
http://www.mathworks.com/matlabcentral/answers/728-how-do-i-write-a-good-question-for-matlab-answers
http://www.mathworks.com/matlabcentral/answers/6200-tutorial-how-to-ask-a-question-on-answers-and-get-a-fast-answer
Instead of digging up an unanswered (and unresponded to) question from 4 months ago.

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeDigital Filter Analysis についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by