- Speech Emotion Recognition (uses a neural network)
- Train Speech Emotion Recognition System (uses ivectorSystem, an end-to-end machine learning system in Audio Toolbox)
How do i change my audio data to be the same length for an AudioDataStore
5 ビュー (過去 30 日間)
古いコメントを表示
Hi, i am wanting to do a basic knn classifier of the RAVDESS dataset
I am writing a project that will do a knn of speech emotion dataset to test accuracy. the problem im having currently is my files are all different lengths so if i try to concatenate my features to use fitcknn the dimensions are not consistent.
this is what i have so far
%audiodatastore to story all 1440 audio clips
ads = audioDatastore(directory, "IncludeSubfolders",true, 'FileExtensions', '.wav');
ads.Labels = {audioData.emotion};
%shuffle audio data and split into training and testing data
shufAds = shuffle(ads);
[trainSet, testSet] = splitEachLabel(shufAds, 0.8);
% Extract audio features using audioFeatureExtractor
aFe = audioFeatureExtractor('SampleRate', 48000 , ...
'spectralRolloffPoint',true, 'spectralSpread', true,pitch=true);
trainFeatures = extract(aFe, trainSet);
trainLabels = trainSet.Labels;
feat1 = zeros(493, 1152);
feat2 = zeros(493, 1152);
feat3 = zeros(493, 1152);
for i = 1:1152
% Extract spectralRolloffPoint feature
feat1(:, i) = trainFeatures{i}(:, 1);
% Extract spectralSpread feature
feat2(:, i) = trainFeatures{i}(:, 2);
% Extract pitch feature
feat3(:, i) = trainFeatures{i}(:, 3);
end
%kMd = fitcknn(trainFeatures, trainLabels, 'NumNeighbors', 3);
i understand feat1,feat2 and feat3 dont work, the error there is
%Unable to perform assignment because the size of the left side is 493-by-1 and the size of the right side is 327-by-1.
Error in untitled2 (line 61)
feat1(:, i) = trainFeatures{i}(:, 1);`
if anyone could help me out with making my audio all the same length that would be a lifesaver, i dont care at this point whether i truncate or pad, whatevers easiest.
Obvioulsy if my logic so far is completely off, any help would be amazing.
0 件のコメント
採用された回答
Brian Hemmat
2024 年 4 月 17 日
Are you sure you need the audio files the same length for your workflow? Take a look at this example for a workflow with fitcknn that does not require the signals to be the same length:
The above example should be generalizable to your dataset and task.
Also, the following might be of interest to you:
To answer your question directly, here are a couple approaches you could take to make the signals the same length:
% Get the dataset
loc = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip");
unzip(loc,pwd)
ads = audioDatastore(pwd,IncludeSubfolders=true);
[~,adsInfo] = readfile(ads,1);
fs = adsInfo.SampleRate;
%% Set up feature extractor.
afe = audioFeatureExtractor(SampleRate=fs, ...
Window=hamming(round(0.03*fs),"periodic"), ...
OverlapLength=round(0.02*fs), ...
spectralRolloffPoint=true, ...
spectralSpread=true, ...
pitch=true);
%% Option 1: Extract features then truncate
features = extract(afe,ads);
% You can either choose the min of samplesPerFile to truncate all to the
% minimum, the max to pad all, or the mean to pad or truncate as
% appropriate.
szin = cellfun(@(x)size(x,1),features);
szout = round(mean(szin));
features = cellfun(@(x)resize(x,szout),features,UniformOutput=false);
%% Option 2: Make signals same length then extract features
% Get distribution of lengths.
samplesPerFile = cellfun(@(x)audioinfo(x).TotalSamples,ads.Files);
histogram(samplesPerFile) % visualize distribution
xlabel('Num Samples')
ylabel('Num Files')
% You can either choose the min of samplesPerFile to truncate all to the
% minimum, the max to pad all, or the mean to pad or truncate as
% appropriate.
sz = round(mean(samplesPerFile));
adsT = transform(ads,@(x)resize(x,sz));
adsT = transform(adsT,@(x){extract(afe,x)});
features = readall(adsT);
その他の回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Feature Extraction についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!