knnimpute in training/ testing sets

2 ビュー (過去 30 日間)
Salim Al-Wasity
Salim Al-Wasity 2020 年 12 月 16 日
回答済み: Aditya Patil 2020 年 12 月 24 日
Dear support
I am planning to convert my machine learning code from R to MATLAB in which I impute the missing variable using KNN. In the R code, I impute the missing data after I spilt them into training and testing sets to prevent the double dipping. So the R code simple will be as follow:
  • Impute missing values in the training dataset (mltrain) only:
  • mltrain2 <- DMwR::knnImputation(mltrain)
  • Impute missing values in the testing dataset (mltest) using a data frame (here the training dataset) containing the data set that should be used to find the neighbours
  • mltest <- DMwR::knnImputation(mltest,distData = mltrain)
In MATLAB, I tried to use (knnimpute) on the training and testing datasets seperatly in the same way as the R code above, however, there is no option to pass the training data frame during the imputation of the missing values of the testing dataset.
Any suggestion on how to solve this issue?
Sincerely
Salim AL-Wasity

回答 (1 件)

Aditya Patil
Aditya Patil 2020 年 12 月 24 日
Currently this functionality is not available in knnimpute. I have brought this request to the notice of concerned developers. It might be considered in any of the future releases.
As a workaround, you can train regression models on training data, and use them to predict missing values in the test dataset. Mulitple models might be required if data is missing in multiple columns.

カテゴリ

Help Center および File ExchangeModel Building and Assessment についてさらに検索

製品


リリース

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by