Speech recognition separating words.

15 ビュー (過去 30 日間)
Leon Ellis
Leon Ellis 2021 年 9 月 4 日
回答済み: Izak Adendorff 2021 年 9 月 5 日
Good day, my task is to create a program that allows the user to record his/her voice and say a few words with pauses inbetween. I then have to create an algorithm to separate the words from that single audio file and save them into their own audio file. This is what I have so far:
Help would be VERY much appreciated. Thanks in advance
r1 = audiorecorder(22050, 24, 1);
disp('Press Enter, then say 20 words that can be used to make multiple sentences');
recordblocking(r1,3); % speak into microphone & say the words
disp('Press Enter to listen to the recording');
p = play(r1); % listen to words
disp('Press Enter to save recording');
mySpeech1 = getaudiodata(r1, 'double'); % get data as int16 array
grid on;
disp('Press enter to save file'); % save the audio file (Change location for testing.
filename = 'C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Recording.wav';
audiowrite(filename, mySpeech1,22050);
[yn, fs] = audioread('Noise.wav');
yn=mean(yn); %Get the average noise
[y, fs] = audioread('Recording.wav'); %Remove the average noise
t = linspace(0, 1, size(y,1))/fs;
tTrans=transpose(t); %Time as x-values
disp('Press Enter to plot the recording');
plot(tTrans,y); %Plot the time vs Amplitude of sound
hold off;
%g = y(abs(y)>0.001); Attempt to only get the parts where the y-value is greater than 0.001 (Only the parts where words are said)
%hold off;
%plotv(g,'*'); Trying to plot only the words said.
  2 件のコメント
Leon Ellis
Leon Ellis 2021 年 9 月 4 日
Thank you very much!



Izak Adendorff
Izak Adendorff 2021 年 9 月 5 日
Oh well, I will give a few pointers since I'm sure a lot of the other students will have the same question. Firstly you would want to setup your recording device correctly.
Fs = 8000
recObj = audiorecorder(Fs,16,1,-1);
It will look something like this. Remember if you use a FFT you will see that most frequencies in a human voice lays below 500 Hz (Give or take). See the FFT below:
So choosing a sampling frequency of 8000 Hz (which is a standard value) should suffice and it saves a bit of computation time. The bit depth doesn't really matter that much but choosing a value of 16 bits to 24 bits should yield good results. Next you want to actually record a voice and get the data. The following code should suffice for it:
x = getaudiodata(recObj);
It should yield a result as seen above. You can use filtering techniques to filter the data, but that is outside the scope of your practical. Now you need the method to seperate the words. You can search for values of data within a set of bounds ex: [0 a] or use other techniques. You can manually enter 'a' or calculate it by calibrating your mic when listening to silence. That is a very basic method. I would use the image processing toolbox to optimize the processing and accuracy of this signal. Remember that signal processing techniques can be used on both images and sound recordings. Have a look at https://www.mathworks.com/help/images/ref/imdilate.html. Using the imdilate() function on the absolute values of your signal, you can generate a new array in a format to easily recognize the words. Plotting the output of this array yields the figure below. Now you can assign a variable looking at the peaks of this signal. For this case the minimum for that variable should be just below 0.1 or in my case 0.09.
Now you can extract the quiet parts from this signal using a logical operator. The envelope variable contains the array of data of the graph above.
quietParts = envelope > 0.09
beginning = strfind(...)
beginning = 1×4
1462 7003 13347 20074
ending = strfind(...)
ending = 1×4
2962 10493 16830 24065
Now using the quietParts logical array you can determine the beginning and ending of words using the strfind() and looking when 0 change to 1 and 1 changing to zero. It would be wise after calculating these edges to increase the length by a set amount of samples, to make sure that you get in the whole word. Now that you have the positions of your beginning and endings of each word, you can simply create new variables with these ranges and plug them back into your original sound array. Using the sound() function you can play each of these words. Below is a high level implementation of the code I would have used.
a = numel(..)
b = zeros(..)
e = zeros(..)
Word() = zeros(..);
b(..) = beginning() - samples;
e(..) = ending() + samples;
Word(..) = x(b(..):e(..));
I hope this helps!
Best of luck!

その他の回答 (0 件)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by