"splitEachLabel" built-in function does not really randomize the picture distribution?

4 ビュー (過去 30 日間)
When I use R2017b to do deep learning classification, the imageDatasotre object is divided into training and test set,whether or not to specify the number or proportion, 'splitEachLabel' optional parameters specified as 'randomized', the training set inside the picture is not randomly arranged, and why?
digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
'nndatasets','DigitDataset');
digitData = imageDatastore(digitDatasetPath, ...
'IncludeSubfolders',true,'LabelSource','foldernames');
trainingNumFiles = 750;
rng(1) % For reproducibility
[trainDigitData,testDigitData] = splitEachLabel(digitData, ...
trainingNumFiles,'randomize');
When you open "trainDigitData.Files" and "trainDigitData.Labels" in a workspace, they do not disrupt the order?

採用された回答

Wentao Du
Wentao Du 2018 年 3 月 1 日
Here the order you see will not be completely different because the labels of "digitData" are in order (from 0 to 9). To observe the effect of "randomize" parameter, you can run
[trainDigitData,valDigitData] = splitEachLabel(digitData,trainNumFiles,'randomize');
multiple times and will find the distribution of actual image files keeps changing.

その他の回答 (1 件)

cui,xingxing
cui,xingxing 2018 年 3 月 1 日
thanks a lot! I found the solution to the problem, if you want to disrupt the label, you can use the shuffle function. Example:
imds_new = shuffle(imds)
  2 件のコメント
debojit sharma
debojit sharma 2023 年 7 月 8 日
Since,it may be risky to do a standard random train/test split when having strong class imbalance.Because very small number of positive cases, we might end up with a train and test set that have very different class distributions. We may even end up with close to zero positive cases in our test set. So, is there anyfunction to do stratified sampling during train/test split that avoids disturbing class balance in our samples in MatLab @cui @Wentao Du . Like the following code in python:
from sklearn.model_selection import train_test_split
train, test = train_test_split(data, test_size = 0.3, stratify=data.buy)
cui,xingxing
cui,xingxing 2023 年 10 月 24 日
@debojit sharma ,You can design your own distribution with sample weighting in MATLAB,such as the MATLAB built-in function randsample, and the following example can inspire you.
Reference:

サインインしてコメントする。

カテゴリ

Help Center および File Exchange函数逼近和聚类 についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!