Load multiple Files from fileDatastore in minibatchqueue

5 ビュー (過去 30 日間)
Simon
Simon 2025 年 3 月 12 日
コメント済み: Joss Knight 2025 年 3 月 14 日
When training my neural network, I noticed that approximately 50% of the runtime is spent on loading data. Currently, the data loading process from my minibatchqueue is sequential and involves numerous function calls. Specifically, each minibatch requires 256 individual function calls, with each file loaded one after another, causing significant delays.
My goal is to parallelize or otherwise optimize this data-loading process to significantly reduce the runtime.
I'm looking for recommendations or best practices to adress the issue.
Any suggestions or ideas for enhancing data-loading efficiency would be greatly appreciated.
the path is similar to this: "C:/user/me/data/s*/spec.mat"
% Create Datastore
fdsNetInput = fileDatastore(path, ReadFcn=@loadDatastoreData, FileExtensions=".mat");
% add labels and create trainSet (not the core of my Question)
fullData = combine(fdsNetInput, fdsLabel);
trainMask = trainMask(randperm(N));
trainData = subset(fullData, trainMask);
my ReanFcn:
function [netInput] = loadDatastoreData(file)
% Load Spectrum
netInputSpectrum = load(file, "spectrum"); %<-------------------------- This Line takes a lot of time
% scale Spectrum to the right size
netInputSpectrum = scaleData(netInputSpectrum, "VGGish", false);
% ouput Spectrum with the propper dims
netInput = dlarray(netInputSpectrum, 'SSC');
end
then I create a minibatchqueue with the fileDatastore
% create train minibatchqueue
trainMBQ = minibatchqueue(trainData,...
MiniBatchFormat = ["SSCB", "CB"], ...
MiniBatchSize = 256, ...
PartialMiniBatch="discard", ...
OutputEnvironment="gpu");
Read the Data from the minibatchqueue to process it.
% Read mini-batch of data.
[input,target] = next(trainMBQ);

採用された回答

Joss Knight
Joss Knight 2025 年 3 月 12 日
Does the training option PreprocessingEnvironment set to 'background' not work for you? Or is this a custom training loop?
  2 件のコメント
Simon
Simon 2025 年 3 月 13 日
Thank you for your fast Answer.
it is a custom training loop and I added PreprocessingEnvironment="parallel" in the Minibatchqueue, witch inproved the runtime a lot.
trainMBQ = minibatchqueue(trainData,...
MiniBatchFormat = ["SSCB", "CB"], ...
MiniBatchSize = minibatchSize, ...
PartialMiniBatch="discard", ...
PreprocessingEnvironment="parallel", ...
OutputEnvironment="gpu");
Joss Knight
Joss Knight 2025 年 3 月 14 日
Great! "background" should hopefully also work and be faster.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeSequence and Numeric Feature Data Workflows についてさらに検索

製品


リリース

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by