Sielence removal for Speaker Recognition using NTIMIT database

2 ビュー (過去 30 日間)
Shaikha Hajri
Shaikha Hajri 2011 年 2 月 17 日
回答済み: Amish 2024 年 9 月 3 日
hi all, i need a robust silence removal code for speaker recognition system. I am using MFCC for feature extraction and GMM for modeling

回答 (1 件)

Amish
Amish 2024 年 9 月 3 日
Hi Shaikha,
To implement a robust silence removal technique for a speaker recognition system using MATLAB and Simulink, you can follow these steps. The goal is to preprocess the audio by removing silence segments to improve the performance of your speaker recognition system.
  1. Load the Audio Signal using the 'audioread' command
  2. Pre-process the Signal by normalizing and applying a pre-emphasis filter to enhance high frequencies
  3. Frame it by dividing the signal into overlapping frames
  4. Compute STE and ZCR values to detect silence
  5. Set thresholds for STE and ZCR to classify frames as silence or speech
  6. Concatenate non-silent frames to form a silence-free signal
  7. Extract MFCC from the processed signal
The following code demonstrates a generic example achieving the same:
[audio, fs] = audioread('your_audio_file.wav');
% Pre-emphasis filter
pre_emphasis = 0.97;
audio = filter([1 -pre_emphasis], 1, audio);
% Framing
frame_size = 0.025; % 25ms
frame_stride = 0.01; % 10ms
frame_length = round(frame_size * fs);
frame_step = round(frame_stride * fs);
% Number of frames
num_frames = floor((length(audio) - frame_length) / frame_step) + 1;
% Short-Time Energy and Zero-Crossing Rate
STE = zeros(num_frames, 1);
ZCR = zeros(num_frames, 1);
for i = 1:num_frames
start_idx = (i-1) * frame_step + 1;
end_idx = min(length(audio), start_idx + frame_length - 1);
frame = audio(start_idx:end_idx);
% Compute STE
STE(i) = sum(frame .^ 2);
% Compute ZCR
ZCR(i) = sum(abs(diff(frame > 0)));
end
% Thresholding
energy_threshold = 0.1 * max(STE);
zcr_threshold = 0.1 * max(ZCR);
% Silence Removal
silence_frames = (STE < energy_threshold) & (ZCR < zcr_threshold);
speech_frames = find(~silence_frames);
% Reconstruct the signal
speech_signal = [];
for i = 1:length(speech_frames)
start_idx = (speech_frames(i)-1) * frame_step + 1;
end_idx = min(length(audio), start_idx + frame_length - 1);
speech_signal = [speech_signal; audio(start_idx:end_idx)];
end
% Feature Extraction (MFCC)
coeffs = mfcc(speech_signal, fs);
% Plotting
figure;
subplot(3,1,1); plot(audio); title('Original Signal');
subplot(3,1,2); plot(STE); title('Short-Time Energy');
subplot(3,1,3); plot(speech_signal); title('Silence Removed Signal');
This can then be interated into a Simulink model by creating a MATLAB Function Block and using the above logic. A 'To Audio Device' block can then be used to play the processed signal or further process it for feature extraction.
Documentation for filtering can be found at: https://www.mathworks.com/help/signal/ug/the-filter-function.html
Hope this helps!

カテゴリ

Help Center および File ExchangeSpeech Recognition についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by