Sielence removal for Speaker Recognition using NTIMIT database

Question

Shaikha Hajri 2011 年 2 月 17 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1520-sielence-removal-for-speaker-recognition-using-ntimit-database

回答済み: Amish 2024 年 9 月 3 日

hi all, i need a robust silence removal code for speaker recognition system. I am using MFCC for feature extraction and GMM for modeling

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Amish 2024 年 9 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1520-sielence-removal-for-speaker-recognition-using-ntimit-database#answer_1509704

MATLAB Online で開く

Hi Shaikha,

To implement a robust silence removal technique for a speaker recognition system using MATLAB and Simulink, you can follow these steps. The goal is to preprocess the audio by removing silence segments to improve the performance of your speaker recognition system.

Load the Audio Signal using the 'audioread' command
Pre-process the Signal by normalizing and applying a pre-emphasis filter to enhance high frequencies
Frame it by dividing the signal into overlapping frames
Compute STE and ZCR values to detect silence
Set thresholds for STE and ZCR to classify frames as silence or speech
Concatenate non-silent frames to form a silence-free signal
Extract MFCC from the processed signal

The following code demonstrates a generic example achieving the same:

[audio, fs] = audioread('your_audio_file.wav');
% Pre-emphasis filter
pre_emphasis = 0.97;
audio = filter([1 -pre_emphasis], 1, audio);
% Framing
frame_size = 0.025; % 25ms
frame_stride = 0.01; % 10ms
frame_length = round(frame_size * fs);
frame_step = round(frame_stride * fs);
% Number of frames
num_frames = floor((length(audio) - frame_length) / frame_step) + 1;
% Short-Time Energy and Zero-Crossing Rate
STE = zeros(num_frames, 1);
ZCR = zeros(num_frames, 1);
for i = 1:num_frames
    start_idx = (i-1) * frame_step + 1;
    end_idx = min(length(audio), start_idx + frame_length - 1);
    frame = audio(start_idx:end_idx);
    
    % Compute STE
    STE(i) = sum(frame .^ 2);
    
    % Compute ZCR
    ZCR(i) = sum(abs(diff(frame > 0)));
end
% Thresholding
energy_threshold = 0.1 * max(STE);
zcr_threshold = 0.1 * max(ZCR);
% Silence Removal
silence_frames = (STE < energy_threshold) & (ZCR < zcr_threshold);
speech_frames = find(~silence_frames);
% Reconstruct the signal
speech_signal = [];
for i = 1:length(speech_frames)
    start_idx = (speech_frames(i)-1) * frame_step + 1;
    end_idx = min(length(audio), start_idx + frame_length - 1);
    speech_signal = [speech_signal; audio(start_idx:end_idx)];
end
% Feature Extraction (MFCC)
coeffs = mfcc(speech_signal, fs);
% Plotting
figure;
subplot(3,1,1); plot(audio); title('Original Signal');
subplot(3,1,2); plot(STE); title('Short-Time Energy');
subplot(3,1,3); plot(speech_signal); title('Silence Removed Signal');

This can then be interated into a Simulink model by creating a MATLAB Function Block and using the above logic. A 'To Audio Device' block can then be used to play the processed signal or further process it for feature extraction.

Documentation for filtering can be found at: https://www.mathworks.com/help/signal/ug/the-filter-function.html

Hope this helps!

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Sielence removal for Speaker Recognition using NTIMIT database

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

Sielence removal for Speaker Recognition using NTIMIT database

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (1 件)

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示