Unable to perform assignment because the size of the left side is 100-by-198 and the size of the right side is 100-by-98. Error in backgroundSpectrograms (line 50) Xbkg(:,:,:,ind) = filterBank * spec;

1 回表示 (過去 30 日間)
I try to do the background spectograms its the same records as in https://www.mathworks.com/help/deeplearning/examples/deep-learning-speech-recognition.html
and it gives me that error :
Warning:
The FFT length is too small to compute the specified number of
bands. Decrease the number of bands or increase the FFT length.
> In designAuditoryFilterBank (line 104)
In backgroundSpectrograms (line 20)
nable to perform assignment because the size of the left side is
100-by-198 and the size of the right side is 100-by-98.
Error in backgroundSpectrograms (line 50)
Xbkg(:,:,:,ind) = filterBank * spec;
I dont know how to fix it its the backgrounds its the same in example so I dont know what is the error about.
Help me to fix it :
ads = 1x1 audioDatastore
numBkgClips = 4000
volumeRange = [1e-4,1]
segmentDuration= 2
hopDuration = 0.010
numBands = 100
frameDuration = 0.025
FFT length = 512 for backgroundSpectograms
help me with the values
if I set FFT length to 1000 the warning out but the error stay
I must give the hopDuration, numBands,frameDuration, segmentDuration values like this because of my own wav files .
When I try do
adsBkg = subset(ads0,ads0.Labels=="_background_noise_");
numBkgClips = 4000;
volumeRange = [1e-4,1];
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands);
XBkg = log10(XBkg + epsil);
it gives me above error.
backgroundSpectogram.m
% backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
% calculates numBkgClips spectrograms of background clips taken from the
% audio files in the |ads| datastore. Approximately the same number of
% clips is taken from each audio file. Before calculating spectrograms, the
% function rescales each audio clip with a factor sampled from a
% log-uniform distribution in the range given by volumeRange.
% segmentDuration is the total duration of the speech clips (in seconds),
% frameDuration the duration of each spectrogram frame, hopDuration the
% time shift between each spectrogram frame, and numBands the number of
% frequency bands.
function Xbkg = backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
disp("Computing background spectrograms...");
fs = 16e3;
FFTLength = 512;
persistent filterBank
if isempty(filterBank)
filterBank = designAuditoryFilterBank(fs,'FrequencyScale','bark',...
'FFTLength',FFTLength,...
'NumBands',numBands,...
'FrequencyRange',[50,7000]);
end
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
wave = read(ads);
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
for j = 1:numClipsPerFile(count)
indStart = randi(numel(wave)-fs);
logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
volume = 10^logVolume;
x = wave(indStart:indStart+fs-1)*volume;
x = max(min(x,1),-1);
[~,~,~,spec] = spectrogram(x,hann(frameLength,'periodic'),frameLength - hopLength,FFTLength,'onesided');
Xbkg(:,:,:,ind) = filterBank * spec;
if mod(ind,1000)==0
disp("Processed " + string(ind) + " background clips out of " + string(numBkgClips))
end
ind = ind + 1;
end
end
disp("...done");
end
  2 件のコメント
imtiaz waheed
imtiaz waheed 2020 年 2 月 6 日
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=='_background_noise_');
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips);volumeRange;segmentDuration;frameDuration;hopDuration;numBands;FFTlength;
disp('Computing background spectrograms...');
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
[wave,info] = read(ads);
fs = info.SampleRate;
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
for j = 1:numClipsPerFile(count)
indStart = randi(numel(wave)-fs);
logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
volume = 10^logVolume;
x = wave(indStart:indStart+fs-1)*volume;
x = max(min(x,1),-1);
Xbkg(:,:,:,ind) = melSpectrogram(x,fs, ...
'WindowLength',frameLength, ...
'OverlapLength',frameLength - hopLength, ...
'FFTLength',512, ...
'NumBands',numBands, ...
'FrequencyRange',[50,7000]);
if mod(ind,1000)==0
disp('Processed ' + string(ind) + ' background clips out of ' + string(numBkgClips))
end
ind = ind + 1;
end
end
disp('...done');
imtiaz waheed
imtiaz waheed 2020 年 2 月 6 日
any one can help me please in this

サインインしてコメントする。

回答 (2 件)

jibrahim
jibrahim 2020 年 1 月 7 日
Hi Barb,
There are two problems:
1) Since you asked for 100 bands in the auditory filter ban, the hard-coded FFT length (512) is too small. 1024 should work.
2) the code hard-codes the expected segment duration to 1 second (by using fs here: x = wave(indStart:indStart+fs-1)*volume;)
I modified and attached the code. This should run now:
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=="_background_noise_");
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands,FFTlength);
  5 件のコメント
Barb
Barb 2020 年 1 月 16 日
if i get your backgroundSpectogram it show me error if i try this part od code:
DoTraining = true;
if doTraining
trainedNet = trainNetwork(augimdsTrain,layers,options);
else
load('commandNet.mat','trainedNet');
end
The error:
Error using trainNetwork (line 170)
Invalid validation data. The output size (3) of the last layer does not
match the number of classes (4).
jibrahim
jibrahim 2020 年 1 月 16 日
Make sure that the argument to the fullyConnectedLayer that precedes the softMaxLayer is equal to the number of classes you are trying to classify. It seems like you have 4 classes, but you using fullyConnectedLayer(3). If you indeed have 3 classes, then maybe the categorical validation array you are supplying has an unused cateogry. You can remove it using removecats:
YValidation = removecats(YValidation);

サインインしてコメントする。


Barb
Barb 2020 年 1 月 22 日
Ok training data works but i dont know how to fix errors when i try to do this part of code
h = figure('Units','normalized','Position',[0.2 0.1 0.6 0.8]);
filterBank = designAuditoryFilterBank(fs,'FrequencyScale','bark',...
'FFTLength',1024,...
'NumBands',numBands,...
'FrequencyRange',[50,7000]);
while ishandle(h)
% Extract audio samples from the audio device and add the samples to
% the buffer.
x = audioIn();
waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end);
waveBuffer(end-numel(x)+1:end) = x;
% Compute the spectrogram of the latest audio samples.
[~,~,~,spec] = spectrogram(waveBuffer,hann(frameLength,'periodic'),frameLength - hopLength,512,'onesided');
spec = filterBank * spec;
spec = log10(spec + epsil);
% Classify the current spectrogram, save the label to the label buffer,
% and save the predicted probabilities to the probability buffer.
[YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu');
YBuffer(1:end-1)= YBuffer(2:end);
YBuffer(end) = YPredicted;
probBuffer(:,1:end-1) = probBuffer(:,2:end);
probBuffer(:,end) = probs';
% Plot the current waveform and spectrogram.
subplot(2,1,1);
plot(waveBuffer)
axis tight
ylim([-0.2,0.2])
subplot(2,1,2)
pcolor(spec)
caxis([specMin+2 specMax])
shading flat
[YMode,count] = mode(YBuffer);
countThreshold = ceil(classificationRate*0.2);
maxProb = max(probBuffer(labels == YMode,:));
probThreshold = 0.7;
subplot(2,1,1);
if YMode == "background" || count<countThreshold || maxProb < probThreshold
title(" ")
else
title(string(YMode),'FontSize',20)
end
drawnow
end
: Errors :
Error using DAGNetwork/calculatePredict>predictBatch (line 151)
Incorrect input size. The input images must have a size of [100 198 1].
Error in DAGNetwork/calculatePredict (line 17)
Y = predictBatch( ...
Error in DAGNetwork/classify (line 134)
scores = this.calculatePredict( ...
Error in SeriesNetwork/classify (line 502)
[labels, scores] = this.UnderlyingDAGNetwork.classify(X,
varargin{:});
  1 件のコメント
jibrahim
jibrahim 2020 年 1 月 23 日
Make sure the size of the image going into your network matches the image size you used in training:
[YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu');
It looks like the size of spec is not [100 98 1].
I remember you were generating spectrograms based on 2-second segments. Make sure waveBuffer holds indeed 2 seconds. I think the originsl demo uses one second, so you might have to slightly change those three lines of code:
x = audioIn();
waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end);
waveBuffer(end-numel(x)+1:end) = x;

サインインしてコメントする。

製品

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by