Is there a way to create a custom cvpartition?

Question

Christian 2021 年 6 月 14 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/855560-is-there-a-way-to-create-a-custom-cvpartition

コメント済み: Drew 2024 年 9 月 12 日

Hi,

I'd like you use crossvalidation in a model with a custom set of training/test sample vectors.

To use some of the built-in functionalities of Matlab I'd like to pass the input via the CVPartition Name/Value input parameter. Is there a way to define a custom set of train/test indices using cvpartition?

Thanks,

Christian

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Shraddha Jain 2021 年 6 月 25 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/855560-is-there-a-way-to-create-a-custom-cvpartition#answer_733120

Hi Christian,

As of now, the functionality to use custom indices in the cvpartition object is not available in the Statistics and Machine Learning toolbox. It might be considered in a future release.

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Giovanni Attolico 2021 年 10 月 13 日

It seems to me that the indices are stored inside the cvpartition variable but they are read-only using the functions "training" and "test". While waiting for the "future release", it is not possible to define two functions "settraining" and "settest" that write indices in the variable? That would be really useful in many situation to allow the use of the resulting variable in all the situations where the current random partition can be used ...

Thanks

Christian 2021 年 10 月 13 日

I had the same thoughts on this. However, I stopped trying to alter the code in cvpartition after a few attempts and ended up in writing a complete custom approach for crossvalidation procedures. In addition to custom indices you're also more flexible with the used metrics and and and...

サインインしてコメントする。

Answer 2

Giovanni 2024 年 1 月 9 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/855560-is-there-a-way-to-create-a-custom-cvpartition#answer_1386061

MATLAB Online で開く

Custom partition has been recently introduced in MATLAB r2023B, but I suspect it does not work as expected since cross validation results from 2 neural netwroks differ even if trained and validated with the same folds (cv partition using the custom partition options with indices generated by crossvalind) and the networks are trained with the same parameters.

clear variables
close all
clc
tab = readtable("three_selected_feature.xlsx"); %features dataset
featuresTab = tab(:,2:end-1);
feature = table2array(tab(:,2:end-1));
y = table2array(tab(:,end));
for i = 1:6 %repeat for different training test splitting.
cTest = cvpartition(y,'HoldOut',0.2,'Stratify',true);
train = training(cTest);
y_train = y(train);
testing = test(cTest);
cvIndices = crossvalind('Kfold',size(y_train,1),5);
cv = cvpartition("CustomPartition",cvIndices);
% cv = cvpartition(y(train),"Resubstitution");
features_scaled = normalize(feature,'zscore');
modelNET1 = fitcnet(features_scaled(train,:),y_train,'PredictorNames',featuresTab.Properties.VariableNames,'CVPartition',cv);
modelNET2 = fitcnet(features_scaled(train,:),y_train,'PredictorNames',featuresTab.Properties.VariableNames,'CVPartition',cv);
%models trained with the same parameters and features and crossvalidated
%using custom partitions. cv is based on cvIndices generated by the
%crossvalind function. For the same "i" (for loop) results of the two
%models should be the same, but they differ.
% cvmodelLDA1 = crossval(modelLDA1,'CVPartition',cv);
lossNET1(i) = kfoldLoss(modelNET1);
accNET1(i) = 1 - lossNET1(i);
% cvmodelLDA2= crossval(modelLDA2,'CVPartition',cv);
lossNET2(i) = kfoldLoss(modelNET2);
accNET2(i) = 1 - lossNET2(i);
end

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Drew 2024 年 9 月 12 日

The differences observed above have nothing to do with using a custom partition, but are rather due to differences in initialization of layer weights. When building and validating models using any cvpartition (custom or not), remember that some model training algorithms have randomness in the model training process. For example, fitcnet includes random initialization of weights, as seen in the parameter LayerWeightsInitializer. Therefore, to get identical validation accuracy, it is necessary to set both the cvpartition and the random seed before training. For an example of getting matching predictions (and loss), see the answer that I posted to this question.

サインインしてコメントする。

Answer 3

Drew 2024 年 9 月 12 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/855560-is-there-a-way-to-create-a-custom-cvpartition#answer_1515655

MATLAB Online で開く

Custom partitions were introduced in R2023b. See https://www.mathworks.com/help/stats/cvpartition.html, and in particular the new cvpartition creation syntax 'c=cvpartition("CustomPartition",testSets)' . With this functionality, users can create cvparition objects that meet their partition requirements, such as for keeping certain observations grouped together because they come from the same underlying physical sample.

When building and validating models using any cvpartition (custom or not), remember that some model training algorithms have randomness in the model training process. For example, fitcnet includes random initialization of weights, as seen in the parameter LayerWeightsInitializer. Therefore, to get identical validation accuracy, it is necessary to set both the cvpartition and the random seed before training. This is ilustrated in the code below. When the random seed is set before training, the predictions are identical.

If this answer helps you, please remember to accept the answer.

% Load the data
t = readtable("fisheriris.csv");
% Two trials. 
% In the first, do not reset random seed
% In the second, reset the random seed before each model training.
for j=1:2
    % Create a 5-fold cross-validation partition
    cv1 = cvpartition(t.Species, 'KFold', 5);
    % Take the testsets from the above and use them to make a custom
    % partition.
    testsets = test(cv1,'all');
    cv = cvpartition('CustomPartition',testsets);
   
    % Set random number generator seed on trial 2 before model training
    if (j==2) rng(0); end
    % Train the first model with cross-validation. This builds a set of k
    % cross-validation models.
    net1 = fitcnet(t, 'Species', 'CVPartition', cv);
    % Calculate validation error rate for the first model
    errorRate1 = kfoldLoss(net1);
    % Get predictions for the first model
    predictions1 = kfoldPredict(net1);
    
    % Set random number generator seed on trial 2 before model training
    if (j==2) rng(0); end
    % Train the second model with cross-validation. This builds a set of k
    % cross-validation models.
    net2 = fitcnet(t, 'Species', 'CVPartition', cv);
    % Calculate validation error rate for the second model
    errorRate2 = kfoldLoss(net2);
    % Get predictions for the second model
    predictions2 = kfoldPredict(net2);
    
    % Compare predictions
    samePredictions = strcmp(predictions1, predictions2);
    differentIdx = find(~samePredictions);
    
    % Report results
    if (j==1)
        fprintf('\nWithout resetting random seed: trial %d \n',j);
    else
        fprintf('\nWith resetting random seed: trial %d\n',j);
    end
    fprintf('Validation error rate for Model 1: %.2f%%\n', errorRate1 * 100);
    fprintf('Validation error rate for Model 2: %.2f%%\n', errorRate2 * 100);
    fprintf('Same predictions: %d\n', sum(samePredictions));
    fprintf('Different predictions: %d\n', length(differentIdx));
    
    if ~isempty(differentIdx)
        fprintf('Differences at indices:\n');
        for i = 1:length(differentIdx)
            fprintf('  Index %d: Model1 = %s, Model2 = %s\n', ...
                differentIdx(i), predictions1{differentIdx(i)}, predictions2{differentIdx(i)});
        end
    end
end
Without resetting random seed: trial 1 
Validation error rate for Model 1: 4.67%
Validation error rate for Model 2: 4.00%
Same predictions: 145
Different predictions: 5
Differences at indices:
  Index 73: Model1 = versicolor, Model2 = virginica
  Index 78: Model1 = virginica, Model2 = versicolor
  Index 107: Model1 = versicolor, Model2 = virginica
  Index 127: Model1 = virginica, Model2 = versicolor
  Index 135: Model1 = versicolor, Model2 = virginica
With resetting random seed: trial 2
Validation error rate for Model 1: 16.00%
Validation error rate for Model 2: 16.00%
Same predictions: 150
Different predictions: 0

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

Is there a way to create a custom cvpartition?

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (3 件)

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

Is there a way to create a custom cvpartition?

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (3 件)

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示