フィルターのクリア

How can I use repeated, k-fold cross-validation results with rocmetrics?

5 ビュー (過去 30 日間)
Thomas Kirsh
Thomas Kirsh 2023 年 10 月 6 日
コメント済み: the cyclist 2023 年 10 月 11 日
I have 10-repeat 5-fold cross-validation scores and labels for a model that I'm trying to efficiently plot ROC curves for usine rocmetrics. When I run the line
robj = rocmetrics(target, prediction, 1);
I get the error
Error using rocmetrics>validateScoresLabelsAndWeights
The cell array of cross-validated scores must be a vector.
Each cell in target and prediction are double arrays of shapes 54x1 or 55x1. The shapes match cell to cell between both. I'm confused by this error because it's clear that the cell array of my scores(predictions) are vectors. I think the issue is the repeated cross-validation. How can I format my target and prediction in order to use rocmetrics with my results?

回答 (1 件)

the cyclist
the cyclist 2023 年 10 月 6 日
Note the following line from the rocmetrics documentation:
"For cross-validated data, you must specify Labels, Scores, and Weights as cell arrays with the same number of elements. rocmetrics treats an element in the cell arrays as data from one cross-validation fold and computes pointwise confidence intervals for the performance metrics. The length of Labels{i} and the number of rows in Scores{i} must be equal."
You need to supply the fold weights.
Alternatively, you could loop over the folds to see ROC metrics on each fold, or decide prior to calling rocmetrics how you want to combine the folds into a single prediction.
  7 件のコメント
Thomas Kirsh
Thomas Kirsh 2023 年 10 月 11 日
Thank you, that's a good workaround! My only concern is thinking about if this gives an accurate mean ROC for my experiment. Wouldn't it make more sense to concatenate the folds first and then reshape?
the cyclist
the cyclist 2023 年 10 月 11 日
I have to admit that I don't really have experience specifically with repeated k-fold cross-validation, so I don't know what is conventional in terms of combining information from repeats and folds. My impression is that one treats it as M*k results, which is what my code is doing. I don't think concatenating folds is typically done, because that would look like you had a dataset that was k times larger.
Also ... I hope the data you posted isn't your real data. The model performance is no better than random.

サインインしてコメントする。

カテゴリ

Help Center および File ExchangeStatistics and Machine Learning Toolbox についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by