How to implement kNN imputation on test set without data leakage?

4 ビュー (過去 30 日間)
Sebastian Weber
Sebastian Weber 2022 年 7 月 23 日
コメント済み: Zexi Yang 2022 年 8 月 17 日
I am using knnimpute to handle missing data for machine learning. My data is subdivided into a training and test set (mTrain and mTest). The usage of knnimpute for the training set is easy. For the test set, however, I need the algorithm to impute missing values by using the nearest neighbor from the training set to prevent data leakage. Now I am wondering how to implement knnimpute on the test set in this way. Does anybody have an idea how to code that?
  1 件のコメント
Zexi Yang
Zexi Yang 2022 年 8 月 17 日
Why do you have to impute test set using nearest neighbor from training set? You can just use nearest neighours from test set without having any data leakage. Data leakage is where you impute training set using data from test set.

サインインしてコメントする。

回答 (0 件)

カテゴリ

Help Center および File ExchangeHypothesis Tests についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by