How do i change my audio data to be the same length for an AudioDataStore

Question

A 2024 年 4 月 16 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2107526-how-do-i-change-my-audio-data-to-be-the-same-length-for-an-audiodatastore

コメント済み: A 2024 年 4 月 17 日

Hi, i am wanting to do a basic knn classifier of the RAVDESS dataset

I am writing a project that will do a knn of speech emotion dataset to test accuracy. the problem im having currently is my files are all different lengths so if i try to concatenate my features to use fitcknn the dimensions are not consistent.

this is what i have so far

%audiodatastore to story all 1440 audio clips
ads = audioDatastore(directory, "IncludeSubfolders",true, 'FileExtensions', '.wav');
ads.Labels = {audioData.emotion};
%shuffle audio data and split into training and testing data
shufAds = shuffle(ads);
[trainSet, testSet] = splitEachLabel(shufAds, 0.8);
% Extract audio features using audioFeatureExtractor
aFe = audioFeatureExtractor('SampleRate', 48000 , ...
     'spectralRolloffPoint',true, 'spectralSpread', true,pitch=true);
trainFeatures = extract(aFe, trainSet);
trainLabels = trainSet.Labels;
feat1 = zeros(493, 1152);
feat2 = zeros(493, 1152);
feat3 = zeros(493, 1152);
for i = 1:1152
% Extract spectralRolloffPoint feature
feat1(:, i) = trainFeatures{i}(:, 1);
% Extract spectralSpread feature
feat2(:, i) = trainFeatures{i}(:, 2);
% Extract pitch feature
feat3(:, i) = trainFeatures{i}(:, 3);
end
%kMd = fitcknn(trainFeatures, trainLabels, 'NumNeighbors', 3);

i understand feat1,feat2 and feat3 dont work, the error there is

%Unable to perform assignment because the size of the left side is 493-by-1 and the size of the     right side is 327-by-1.
Error in untitled2 (line 61)
feat1(:, i) = trainFeatures{i}(:, 1);`

if anyone could help me out with making my audio all the same length that would be a lifesaver, i dont care at this point whether i truncate or pad, whatevers easiest.

Obvioulsy if my logic so far is completely off, any help would be amazing.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Brian Hemmat 2024 年 4 月 17 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/2107526-how-do-i-change-my-audio-data-to-be-the-same-length-for-an-audiodatastore#answer_1443251

MATLAB Online で開く

Are you sure you need the audio files the same length for your workflow? Take a look at this example for a workflow with fitcknn that does not require the signals to be the same length:

Audio Feature Selection for Machine Learning Tasks

The above example should be generalizable to your dataset and task.

Also, the following might be of interest to you:

Speech Emotion Recognition (uses a neural network)
Train Speech Emotion Recognition System (uses ivectorSystem, an end-to-end machine learning system in Audio Toolbox)

To answer your question directly, here are a couple approaches you could take to make the signals the same length:

% Get the dataset
loc = matlab.internal.examples.downloadSupportFile("audio","FSDD.zip");
unzip(loc,pwd)
ads = audioDatastore(pwd,IncludeSubfolders=true);
[~,adsInfo] = readfile(ads,1);
fs = adsInfo.SampleRate;
%% Set up feature extractor.
afe = audioFeatureExtractor(SampleRate=fs, ...
    Window=hamming(round(0.03*fs),"periodic"), ...
    OverlapLength=round(0.02*fs), ...
    spectralRolloffPoint=true, ...
    spectralSpread=true, ...
    pitch=true);
%% Option 1: Extract features then truncate
features = extract(afe,ads);
% You can either choose the min of samplesPerFile to truncate all to the
% minimum, the max to pad all, or the mean to pad or truncate as
% appropriate.
szin = cellfun(@(x)size(x,1),features);
szout = round(mean(szin));
features = cellfun(@(x)resize(x,szout),features,UniformOutput=false);
%% Option 2: Make signals same length then extract features
% Get distribution of lengths.
samplesPerFile = cellfun(@(x)audioinfo(x).TotalSamples,ads.Files);
histogram(samplesPerFile) % visualize distribution
xlabel('Num Samples')
ylabel('Num Files')
% You can either choose the min of samplesPerFile to truncate all to the
% minimum, the max to pad all, or the mean to pad or truncate as
% appropriate.
sz = round(mean(samplesPerFile));
adsT = transform(ads,@(x)resize(x,sz));
adsT = transform(adsT,@(x){extract(afe,x)});
features = readall(adsT);

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

A 2024 年 4 月 17 日

Hi Brian thanks for responding, definitely trying to get them all working with different file lengths right now. IM trying to follow that first link you sent but still having trouble seperating my features out after aFe so they can be fed into fitcknn. cant figure out the logic at all but thank you for your solution with the file lengths that helps a lot.

サインインしてコメントする。

How do i change my audio data to be the same length for an AudioDataStore

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

How do i change my audio data to be the same length for an AudioDataStore

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示