With cvpartition, how to stratify the partitions with respect to more than one variable (with respect to class label and some other label)

26 ビュー (過去 30 日間)
I am trying to use the convenient cvpartition object to have fitclinear internally perform cross-validation (more precisely for hyper parameter optimization). My data is grouped, with equal number of the two class label in each group. I need the Kfold partitioning of these data to be stratified with respect to both class labels and group label, such that at each fold: 1) class labels are balanced 2) any group label never show samples in both the train and test subsample.
More visually, below is an example of one possible partition, whith samples as rows in cLabel (the class labels), gLabel (the group labels) and kLabel (the index of the fold in which the sample is assigned to the test subsample):
>> cLabel = [1 1 1 1 1 1 2 2 2 2 2 2];
gLabel = [1 1 2 2 3 3 1 1 2 2 3 3 ];
kLabel = [3 3 2 2 1 1 2 2 3 3 1 1];
[cLabel' gLabel' kLabel']
ans =
1 1 1
1 1 1
1 2 2
1 2 2
1 3 3
1 3 3
2 1 2
2 1 2
2 2 3
2 2 3
2 3 1
2 3 1
I would be happy to manually specify values in a cvpartition object and then pass it to fitclinear. I tried some hack found to do so in another post (https://www.mathworks.com/matlabcentral/answers/203155-how-to-manually-construct-or-modify-a-cross-validation-object-in-matlab), but still was not able to manually change the cvpartition object. :-(
Any idea please?

回答 (1 件)

Cris LaPierre
Cris LaPierre 2021 年 11 月 2 日
編集済み: Cris LaPierre 2021 年 11 月 2 日
The documenation seems to indicate that grouping can only be done on a single variable. The workaround, then, might be to use findgroups to create a new grouping variable based on the values in several variables.
cLabel = [1 1 1 1 1 1 2 2 2 2 2 2];
gLabel = [1 1 2 2 3 3 1 1 2 2 3 3 ];
% Group by cLable and gLabel
G = findgroups(cLabel, gLabel)
G = 1×12
1 1 2 2 3 3 4 4 5 5 6 6
% Create partition based on grouping variable G
c = cvpartition(G,'Kfold',2,'stratify',true)
c =
K-fold cross validation partition NumObservations: 12 NumTestSets: 2 TrainSize: 6 6 TestSize: 6 6
% Inspect assignment for first fold
training(c,1)
ans = 12×1 logical array
1 0 0 1 1 0 0 1 1 0
test(c,1)
ans = 12×1 logical array
0 1 1 0 0 1 1 0 0 1
  2 件のコメント
Alessandro La Chioma
Alessandro La Chioma 2023 年 11 月 5 日
Anyone found a good solution to the original question?
Thank you!

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeModel Building and Assessment についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by