- Load the Audio Signal using the 'audioread' command
- Pre-process the Signal by normalizing and applying a pre-emphasis filter to enhance high frequencies
- Frame it by dividing the signal into overlapping frames
- Compute STE and ZCR values to detect silence
- Set thresholds for STE and ZCR to classify frames as silence or speech
- Concatenate non-silent frames to form a silence-free signal
- Extract MFCC from the processed signal
Sielence removal for Speaker Recognition using NTIMIT database
2 ビュー (過去 30 日間)
古いコメントを表示
hi all, i need a robust silence removal code for speaker recognition system. I am using MFCC for feature extraction and GMM for modeling
0 件のコメント
回答 (1 件)
Amish
2024 年 9 月 3 日
Hi Shaikha,
To implement a robust silence removal technique for a speaker recognition system using MATLAB and Simulink, you can follow these steps. The goal is to preprocess the audio by removing silence segments to improve the performance of your speaker recognition system.
The following code demonstrates a generic example achieving the same:
[audio, fs] = audioread('your_audio_file.wav');
% Pre-emphasis filter
pre_emphasis = 0.97;
audio = filter([1 -pre_emphasis], 1, audio);
% Framing
frame_size = 0.025; % 25ms
frame_stride = 0.01; % 10ms
frame_length = round(frame_size * fs);
frame_step = round(frame_stride * fs);
% Number of frames
num_frames = floor((length(audio) - frame_length) / frame_step) + 1;
% Short-Time Energy and Zero-Crossing Rate
STE = zeros(num_frames, 1);
ZCR = zeros(num_frames, 1);
for i = 1:num_frames
start_idx = (i-1) * frame_step + 1;
end_idx = min(length(audio), start_idx + frame_length - 1);
frame = audio(start_idx:end_idx);
% Compute STE
STE(i) = sum(frame .^ 2);
% Compute ZCR
ZCR(i) = sum(abs(diff(frame > 0)));
end
% Thresholding
energy_threshold = 0.1 * max(STE);
zcr_threshold = 0.1 * max(ZCR);
% Silence Removal
silence_frames = (STE < energy_threshold) & (ZCR < zcr_threshold);
speech_frames = find(~silence_frames);
% Reconstruct the signal
speech_signal = [];
for i = 1:length(speech_frames)
start_idx = (speech_frames(i)-1) * frame_step + 1;
end_idx = min(length(audio), start_idx + frame_length - 1);
speech_signal = [speech_signal; audio(start_idx:end_idx)];
end
% Feature Extraction (MFCC)
coeffs = mfcc(speech_signal, fs);
% Plotting
figure;
subplot(3,1,1); plot(audio); title('Original Signal');
subplot(3,1,2); plot(STE); title('Short-Time Energy');
subplot(3,1,3); plot(speech_signal); title('Silence Removed Signal');
This can then be interated into a Simulink model by creating a MATLAB Function Block and using the above logic. A 'To Audio Device' block can then be used to play the processed signal or further process it for feature extraction.
Documentation for filtering can be found at: https://www.mathworks.com/help/signal/ug/the-filter-function.html
Hope this helps!
0 件のコメント
参考
カテゴリ
Help Center および File Exchange で Speech Recognition についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!