What type of cross validation to use if my data has 5 scans per sample to avoid having same sample in train and test set

Question

NCA 2022 年 4 月 1 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1686309-what-type-of-cross-validation-to-use-if-my-data-has-5-scans-per-sample-to-avoid-having-same-sample-i

回答済み: Drew 2023 年 1 月 3 日

My data (150 samples) has 5 NIR scans per sample. I am not able to average the 5 scans because some of them were taken out as they were not valid. I am using Support Vector Machines form the Classification learner apps from Machine Learning and Deep Learning Matlab 2020b.

I used tgspcread to read my NIR files onto Matlab, normalised by standard deviation and only used the valid samples for my classification. Am I right to say that the samples are independent of each other or will the remaining samples (out of the 5 scans) be termed as same even after normalisation?

My second quaetion is, what type of cross validation will be ideal to avoid having the same samples in the training set and test set considering the fact that the classification app has only KFold and Holdout Cross Validation

Thanks

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Drew 2023 年 1 月 3 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1686309-what-type-of-cross-validation-to-use-if-my-data-has-5-scans-per-sample-to-avoid-having-same-sample-i#answer_1140637

In near-infrared spectroscopy, it is common to average the spectra obtained from scans of the same sample. Even if some scans were "taken out as they were not valid", the remaining scans could still be averaged. To ignore NaNs when calculating the mean, use the "omitnan" flag for the mean function https://www.mathworks.com/help/matlab/ref/mean.html#bt5b82t-1-nanflag

If you decide to keep multiple scans from the same sample, and you want to ensure that scans from the same sample are either all in train, all in validation, or all in test when using Classification Learner, then one strategy is to use a more recent release of Classification Learner. Starting in R2021a Classification Learner, a separate test set can be loaded. This provides more ability to do some data partition outside of the app. For example, you could create separate train and test sets outside of the app, and then use those in the classification learner app.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

What type of cross validation to use if my data has 5 scans per sample to avoid having same sample in train and test set

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

What type of cross validation to use if my data has 5 scans per sample to avoid having same sample in train and test set

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

製品

リリース

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示