In this code, why did they choose the numbers that they chose?

1 回表示 (過去 30 日間)
Axel Blaze
Axel Blaze 2022 年 5 月 4 日
回答済み: Steven Lord 2022 年 5 月 4 日
So this code is for a project where they are separating "human voice" from the "background voice", in an input audio signal, by taking out the data points that contribute the most to the signal (The voice of a person will have the maximum amplitude and hence will contribute more to the signal than the background noise), using FFT.
My question here is, why have they chosen the numbers that they have chosen for the zero matrices f2 and f3? 1:400000, 648576:end, 472288: 514288, and finally 534288:576288? I get that the first two numbers are for the background noise and the last two are for the human voice. But why specifically these numbers? Why are they ignoring the centre of the values near 0 for the human voice? I tried changing some of these numbers equally (for both sides of the equation) and it resulted in the fft looking less symmetric?
I heard them say that they chose these numbers based on the fft of the input signal. But even then why these specific numbers exactly?
Would love to hear any explanation.
clc;
clear all;
close all;
% Reads data from the file, and returns sampled data, a, and a sample rate for that data, fs.
[a,fs]=audioread("C:\Users\benpa\AudioSample\starset.wav");
% fs=48000;
% setting length of matrix in order of 2^n (n=20)
b=a(1:1048576,:);
Length_audio=length(b); %Calculating length of b
df=fs/Length_audio; %Discretizes signal by taking small parts of input signal in time domain
%df = 48000/1048576 = 0.0458
frequency_audio=-fs/2:df:fs/2-df; %Nyquist Rate for minimal loss
figure %creates a new figure window
% time domain plot of input signal
plot(b)
title(' Input Audio');
xlabel('Time(s)');
ylabel('Amplitude');
%sound(a,fs);
%%
% Taking fft of b and then shifting zero frequency component to the centre
%of the array using fftshift
FFT_audio_in=fftshift(fft(b))/length(fft(b));
f4=FFT_audio_in;
figure
% frequency domain plot of input signal
plot(frequency_audio,abs(FFT_audio_in));
title('FFT of Input Audio');
xlabel('Frequency(Hz)');
ylabel('Amplitude');
%%
% Initializing zero matrices of same size
%f2 matrix is for background music and f3 contains human voice
f3=zeros(1048576,2);
f2=zeros(1048576,2);
%selecting a particular band that dominates our signal i.e. has contributed
%maximum to our signal(decided by looking at amplitude in frequency domain)
%and adding it to matrix f3. Adding others to matrix f2.
f2(1:400000, :) = FFT_audio_in(1:400000, :);
f2(648576:end, :) = FFT_audio_in(648576:end, :);
f3(472288:514288, :) = FFT_audio_in(472288:514288, :);
f3(534288:576288, :) = FFT_audio_in(534288:576288, :);
%f2 is for background music and f3 (which has the dominating part)is for voice of singer
%for converting fft of human voice to audio file
f1=(f3);
l1=length(f1);
sign=(ifft(ifftshift((f1)*length(b))));
de=fs/l1;
fa=-fs/2:de:fs/2-de;
figure
plot(fa,abs(f1))
title('FFT of Human Voice Audio');
xlabel('Frequency(Hz)');
ylabel('Amplitude');
% we want real part of our signal, that's why we are extracting that using
% Re(z)=(z+z')/2
outh=(sign+conj(sign))*0.5;
audiowrite('human.wav',outh,fs); % Writes a matrix of audio data, outh, with sample rate fs to a file called human.wav
sound(outh,fs); %gives output audio signal of human voice
figure
%plot of output
plot(outh);
title('Human Voice Audio');
xlabel('time');
ylabel('Amplitude');
%%
f1=(f2);
l1=length(f1);
sign=(ifft(ifftshift((f1)*length(b))));
de=fs/l1;
fa=-fs/2:de:fs/2-de;
figure
plot(fa,abs(f1))
title('FFT of Background Audio');
xlabel('Frequency(Hz)');
ylabel('Amplitude');
outb=(sign+conj(sign))*0.5;
audiowrite('back.wav',outb,fs); % Writes a matrix of audio data, outb, with sample rate fs to a file called back.wav
sound(outb,fs); %gives output audio signal of background audio
figure
%plot of output background audio
plot(outb);
title('Background Audio');
xlabel('Time');
ylabel('Amplitude');
  1 件のコメント
dpb
dpb 2022 年 5 月 4 日
From the comments and without the signal it would appear they were arbitrarily selected to work for the specific signal. I'd have no thought they will be of any real use for another recording (or even for another section of the same recording).

サインインしてコメントする。

回答 (1 件)

Steven Lord
Steven Lord 2022 年 5 月 4 日
That doesn't look like a MathWorks example so you might need to ask the author or the person from whom you obtained it, but if I had to guess then from this comment "(decided by looking at amplitude in frequency domain)" I suspect they were chosen manually to select "interesting" pieces of the signal (for some definition of "interesting.")

カテゴリ

Help Center および File ExchangeAudio Processing Algorithm Design についてさらに検索

製品


リリース

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by