How to implement kNN imputation on test set without data leakage?
4 ビュー (過去 30 日間)
古いコメントを表示
I am using knnimpute to handle missing data for machine learning. My data is subdivided into a training and test set (mTrain and mTest). The usage of knnimpute for the training set is easy. For the test set, however, I need the algorithm to impute missing values by using the nearest neighbor from the training set to prevent data leakage. Now I am wondering how to implement knnimpute on the test set in this way. Does anybody have an idea how to code that?
1 件のコメント
Zexi Yang
2022 年 8 月 17 日
Why do you have to impute test set using nearest neighbor from training set? You can just use nearest neighours from test set without having any data leakage. Data leakage is where you impute training set using data from test set.
回答 (0 件)
参考
カテゴリ
Help Center および File Exchange で Hypothesis Tests についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!