I am not sure how to implement the following requirement. When I use undersampling for my supervised Machine Learning Algorithm, how can I assure that the k-fold corresponds to the distribution of the original dataset. The performace metric (e.g. PR AUC) shall refer to the original distribution and not to the distribution of the undersampled set.
It does not make sense to solely perform k-fold cross validation on the entire undersampled dataset.
Your help is highly appreciated!