Speech recognition separating words.

Question

Leon Ellis 2021 年 9 月 4 日

1
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1446479-speech-recognition-separating-words

回答済み: Izak Adendorff 2021 年 9 月 5 日

Good day, my task is to create a program that allows the user to record his/her voice and say a few words with pauses inbetween. I then have to create an algorithm to separate the words from that single audio file and save them into their own audio file. This is what I have so far:

Help would be VERY much appreciated. Thanks in advance

clear

r1 = audiorecorder(22050, 24, 1);

disp('Press Enter, then say 20 words that can be used to make multiple sentences');

pause;

recordblocking(r1,3); % speak into microphone & say the words

disp('Press Enter to listen to the recording');

pause;

p = play(r1); % listen to words

disp('Press Enter to save recording');

pause;

mySpeech1 = getaudiodata(r1, 'double'); % get data as int16 array

grid on;

disp('Press enter to save file'); % save the audio file (Change location for testing.

pause;

filename = 'C:\Users\leone\OneDrive\Desktop\Year 2\Semester 2\EERI 222\Practical1\Recording.wav';

audiowrite(filename, mySpeech1,22050);

[yn, fs] = audioread('Noise.wav');

yn=mean(yn); %Get the average noise

[y, fs] = audioread('Recording.wav'); %Remove the average noise

t = linspace(0, 1, size(y,1))/fs;

tTrans=transpose(t); %Time as x-values

disp('Press Enter to plot the recording');

pause;

y=y-yn;

plot(tTrans,y); %Plot the time vs Amplitude of sound

hold off;

pause;

%g = y(abs(y)>0.001); Attempt to only get the parts where the y-value is greater than 0.001 (Only the parts where words are said)

%hold off;

%plotv(g,'*'); Trying to plot only the words said.

2 件のコメント
なしを表示なしを非表示

Izak Adendorff 2021 年 9 月 4 日

I would suggest that you do your OWN EERI222 work and not copy the first answer that someone posts here. Best of luck!

Leon Ellis 2021 年 9 月 4 日

Thank you very much!

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Izak Adendorff 2021 年 9 月 5 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1446479-speech-recognition-separating-words#answer_781059

MATLAB Online で開く

Oh well, I will give a few pointers since I'm sure a lot of the other students will have the same question. Firstly you would want to setup your recording device correctly.

Fs = 8000
recObj = audiorecorder(Fs,16,1,-1);

It will look something like this. Remember if you use a FFT you will see that most frequencies in a human voice lays below 500 Hz (Give or take). See the FFT below:

So choosing a sampling frequency of 8000 Hz (which is a standard value) should suffice and it saves a bit of computation time. The bit depth doesn't really matter that much but choosing a value of 16 bits to 24 bits should yield good results. Next you want to actually record a voice and get the data. The following code should suffice for it:

recordblocking(recObj,3);
x = getaudiodata(recObj);
plot(x)

It should yield a result as seen above. You can use filtering techniques to filter the data, but that is outside the scope of your practical. Now you need the method to seperate the words. You can search for values of data within a set of bounds ex: [0 a] or use other techniques. You can manually enter 'a' or calculate it by calibrating your mic when listening to silence. That is a very basic method. I would use the image processing toolbox to optimize the processing and accuracy of this signal. Remember that signal processing techniques can be used on both images and sound recordings. Have a look at https://www.mathworks.com/help/images/ref/imdilate.html. Using the imdilate() function on the absolute values of your signal, you can generate a new array in a format to easily recognize the words. Plotting the output of this array yields the figure below. Now you can assign a variable looking at the peaks of this signal. For this case the minimum for that variable should be just below 0.1 or in my case 0.09.

Now you can extract the quiet parts from this signal using a logical operator. The envelope variable contains the array of data of the graph above.

quietParts = envelope > 0.09
beginning = strfind(...)

beginning = 1×4

1462 7003 13347 20074

ending = strfind(...)

ending = 1×4

2962 10493 16830 24065

Now using the quietParts logical array you can determine the beginning and ending of words using the strfind() and looking when 0 change to 1 and 1 changing to zero. It would be wise after calculating these edges to increase the length by a set amount of samples, to make sure that you get in the whole word. Now that you have the positions of your beginning and endings of each word, you can simply create new variables with these ranges and plug them back into your original sound array. Using the sound() function you can play each of these words. Below is a high level implementation of the code I would have used.

a = numel(..)
b = zeros(..)
e = zeros(..)
for
    
    Word() = zeros(..);   
    b(..) = beginning() - samples;
    e(..) = ending() + samples;
    Word(..) = x(b(..):e(..));
        
end
sound(Word{1,1})
plot(Word{1,1})

I hope this helps!

Best of luck!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Speech recognition separating words.

2 件のコメント
なしを表示なしを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Speech recognition separating words.

2 件のコメント なしを表示なしを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

2 件のコメント
なしを表示なしを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示