In this code, why did they choose the numbers that they chose?

Question

Axel Blaze 2022 年 5 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1711615-in-this-code-why-did-they-choose-the-numbers-that-they-chose

回答済み: Steven Lord 2022 年 5 月 4 日

So this code is for a project where they are separating "human voice" from the "background voice", in an input audio signal, by taking out the data points that contribute the most to the signal (The voice of a person will have the maximum amplitude and hence will contribute more to the signal than the background noise), using FFT.

My question here is, why have they chosen the numbers that they have chosen for the zero matrices f2 and f3? 1:400000, 648576:end, 472288: 514288, and finally 534288:576288? I get that the first two numbers are for the background noise and the last two are for the human voice. But why specifically these numbers? Why are they ignoring the centre of the values near 0 for the human voice? I tried changing some of these numbers equally (for both sides of the equation) and it resulted in the fft looking less symmetric?

I heard them say that they chose these numbers based on the fft of the input signal. But even then why these specific numbers exactly?

Would love to hear any explanation.

clc;

clear all;

close all;

% Reads data from the file, and returns sampled data, a, and a sample rate for that data, fs.

[a,fs]=audioread("C:\Users\benpa\AudioSample\starset.wav");

% fs=48000;

% setting length of matrix in order of 2^n (n=20)

b=a(1:1048576,:);

Length_audio=length(b); %Calculating length of b

df=fs/Length_audio; %Discretizes signal by taking small parts of input signal in time domain

%df = 48000/1048576 = 0.0458

frequency_audio=-fs/2:df:fs/2-df; %Nyquist Rate for minimal loss

figure %creates a new figure window

% time domain plot of input signal

plot(b)

title(' Input Audio');

xlabel('Time(s)');

ylabel('Amplitude');

%sound(a,fs);

%%

% Taking fft of b and then shifting zero frequency component to the centre

%of the array using fftshift

FFT_audio_in=fftshift(fft(b))/length(fft(b));

f4=FFT_audio_in;

figure

% frequency domain plot of input signal

plot(frequency_audio,abs(FFT_audio_in));

title('FFT of Input Audio');

xlabel('Frequency(Hz)');

ylabel('Amplitude');

%%

% Initializing zero matrices of same size

%f2 matrix is for background music and f3 contains human voice

f3=zeros(1048576,2);

f2=zeros(1048576,2);

%selecting a particular band that dominates our signal i.e. has contributed

%maximum to our signal(decided by looking at amplitude in frequency domain)

%and adding it to matrix f3. Adding others to matrix f2.

f2(1:400000, :) = FFT_audio_in(1:400000, :);

f2(648576:end, :) = FFT_audio_in(648576:end, :);

f3(472288:514288, :) = FFT_audio_in(472288:514288, :);

f3(534288:576288, :) = FFT_audio_in(534288:576288, :);

%f2 is for background music and f3 (which has the dominating part)is for voice of singer

%for converting fft of human voice to audio file

f1=(f3);

l1=length(f1);

sign=(ifft(ifftshift((f1)*length(b))));

de=fs/l1;

fa=-fs/2:de:fs/2-de;

figure

plot(fa,abs(f1))

title('FFT of Human Voice Audio');

xlabel('Frequency(Hz)');

ylabel('Amplitude');

% we want real part of our signal, that's why we are extracting that using

% Re(z)=(z+z')/2

outh=(sign+conj(sign))*0.5;

audiowrite('human.wav',outh,fs); % Writes a matrix of audio data, outh, with sample rate fs to a file called human.wav

sound(outh,fs); %gives output audio signal of human voice

figure

%plot of output

plot(outh);

title('Human Voice Audio');

xlabel('time');

ylabel('Amplitude');

%%

f1=(f2);

l1=length(f1);

sign=(ifft(ifftshift((f1)*length(b))));

de=fs/l1;

fa=-fs/2:de:fs/2-de;

figure

plot(fa,abs(f1))

title('FFT of Background Audio');

xlabel('Frequency(Hz)');

ylabel('Amplitude');

outb=(sign+conj(sign))*0.5;

audiowrite('back.wav',outb,fs); % Writes a matrix of audio data, outb, with sample rate fs to a file called back.wav

sound(outb,fs); %gives output audio signal of background audio

figure

%plot of output background audio

plot(outb);

title('Background Audio');

xlabel('Time');

ylabel('Amplitude');

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

dpb 2022 年 5 月 4 日

From the comments and without the signal it would appear they were arbitrarily selected to work for the specific signal. I'd have no thought they will be of any real use for another recording (or even for another section of the same recording).

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Steven Lord 2022 年 5 月 4 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1711615-in-this-code-why-did-they-choose-the-numbers-that-they-chose#answer_957100

That doesn't look like a MathWorks example so you might need to ask the author or the person from whom you obtained it, but if I had to guess then from this comment "(decided by looking at amplitude in frequency domain)" I suspect they were chosen manually to select "interesting" pieces of the signal (for some definition of "interesting.")