Unable to perform assignment because the size of the left side is 100-by-198 and the size of the right side is 100-by-98. Error in backgroundSpectrograms (line 50) Xbkg(:,:,:,ind) = filterBank * spec;
2 ビュー (過去 30 日間)
古いコメントを表示
I try to do the background spectograms its the same records as in https://www.mathworks.com/help/deeplearning/examples/deep-learning-speech-recognition.html
and it gives me that error :
Warning:
The FFT length is too small to compute the specified number of
bands. Decrease the number of bands or increase the FFT length.
> In designAuditoryFilterBank (line 104)
In backgroundSpectrograms (line 20)
nable to perform assignment because the size of the left side is
100-by-198 and the size of the right side is 100-by-98.
Error in backgroundSpectrograms (line 50)
Xbkg(:,:,:,ind) = filterBank * spec;
I dont know how to fix it its the backgrounds its the same in example so I dont know what is the error about.
Help me to fix it :
ads = 1x1 audioDatastore
numBkgClips = 4000
volumeRange = [1e-4,1]
segmentDuration= 2
hopDuration = 0.010
numBands = 100
frameDuration = 0.025
FFT length = 512 for backgroundSpectograms
help me with the values
if I set FFT length to 1000 the warning out but the error stay
I must give the hopDuration, numBands,frameDuration, segmentDuration values like this because of my own wav files .
When I try do
adsBkg = subset(ads0,ads0.Labels=="_background_noise_");
numBkgClips = 4000;
volumeRange = [1e-4,1];
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands);
XBkg = log10(XBkg + epsil);
it gives me above error.
backgroundSpectogram.m
% backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
% calculates numBkgClips spectrograms of background clips taken from the
% audio files in the |ads| datastore. Approximately the same number of
% clips is taken from each audio file. Before calculating spectrograms, the
% function rescales each audio clip with a factor sampled from a
% log-uniform distribution in the range given by volumeRange.
% segmentDuration is the total duration of the speech clips (in seconds),
% frameDuration the duration of each spectrogram frame, hopDuration the
% time shift between each spectrogram frame, and numBands the number of
% frequency bands.
function Xbkg = backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
disp("Computing background spectrograms...");
fs = 16e3;
FFTLength = 512;
persistent filterBank
if isempty(filterBank)
filterBank = designAuditoryFilterBank(fs,'FrequencyScale','bark',...
'FFTLength',FFTLength,...
'NumBands',numBands,...
'FrequencyRange',[50,7000]);
end
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
wave = read(ads);
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
for j = 1:numClipsPerFile(count)
indStart = randi(numel(wave)-fs);
logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
volume = 10^logVolume;
x = wave(indStart:indStart+fs-1)*volume;
x = max(min(x,1),-1);
[~,~,~,spec] = spectrogram(x,hann(frameLength,'periodic'),frameLength - hopLength,FFTLength,'onesided');
Xbkg(:,:,:,ind) = filterBank * spec;
if mod(ind,1000)==0
disp("Processed " + string(ind) + " background clips out of " + string(numBkgClips))
end
ind = ind + 1;
end
end
disp("...done");
end
2 件のコメント
imtiaz waheed
2020 年 2 月 6 日
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=='_background_noise_');
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips);volumeRange;segmentDuration;frameDuration;hopDuration;numBands;FFTlength;
disp('Computing background spectrograms...');
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
[wave,info] = read(ads);
fs = info.SampleRate;
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
for j = 1:numClipsPerFile(count)
indStart = randi(numel(wave)-fs);
logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
volume = 10^logVolume;
x = wave(indStart:indStart+fs-1)*volume;
x = max(min(x,1),-1);
Xbkg(:,:,:,ind) = melSpectrogram(x,fs, ...
'WindowLength',frameLength, ...
'OverlapLength',frameLength - hopLength, ...
'FFTLength',512, ...
'NumBands',numBands, ...
'FrequencyRange',[50,7000]);
if mod(ind,1000)==0
disp('Processed ' + string(ind) + ' background clips out of ' + string(numBkgClips))
end
ind = ind + 1;
end
end
disp('...done');
回答 (2 件)
jibrahim
2020 年 1 月 7 日
Hi Barb,
There are two problems:
1) Since you asked for 100 bands in the auditory filter ban, the hard-coded FFT length (512) is too small. 1024 should work.
2) the code hard-codes the expected segment duration to 1 second (by using fs here: x = wave(indStart:indStart+fs-1)*volume;)
I modified and attached the code. This should run now:
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=="_background_noise_");
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands,FFTlength);
5 件のコメント
jibrahim
2020 年 1 月 16 日
Make sure that the argument to the fullyConnectedLayer that precedes the softMaxLayer is equal to the number of classes you are trying to classify. It seems like you have 4 classes, but you using fullyConnectedLayer(3). If you indeed have 3 classes, then maybe the categorical validation array you are supplying has an unused cateogry. You can remove it using removecats:
YValidation = removecats(YValidation);
Barb
2020 年 1 月 22 日
1 件のコメント
jibrahim
2020 年 1 月 23 日
Make sure the size of the image going into your network matches the image size you used in training:
[YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu');
It looks like the size of spec is not [100 98 1].
I remember you were generating spectrograms based on 2-second segments. Make sure waveBuffer holds indeed 2 seconds. I think the originsl demo uses one second, so you might have to slightly change those three lines of code:
x = audioIn();
waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end);
waveBuffer(end-numel(x)+1:end) = x;
参考
カテゴリ
Help Center および File Exchange で AI for Audio についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!