Selection of Neural Network Training Data

Question

Kamuran Turksoy 2017 年 5 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/338795-selection-of-neural-network-training-data

回答済み: Greg Heath 2017 年 5 月 5 日

One can divide his/her data into training, validation and testing and use them to train a neural network model (regression in my case). My question is, what if there are some data points in the training set that impair the model performance? Are there any good ways to find such data points and remove them from the training data set?

I was thinking of using something similar to cross-validation (leave one out) as:

1. Leave a data point from training set

2. Train the model with the rest of the training set

3. If there is improvement in error of the validation (or testing) sets discard the point.

4. Repeat this for all data points until no more improvement is observed.

There are two problems with this method:

1. It will take a long time for large data sets.

2. Random initial weights will add complexity on discarding data points. Constant initial values with a seed value may not be optimum set to begin with.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Greg Heath 2017 年 5 月 5 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/338795-selection-of-neural-network-training-data#answer_265868

Before learning, obtain the mean and standard deviations of the input and target variables. Overlay the plots of the variables on lines of mean +/- m*std for m= 1:4.

Remove or modify outliers.

Hope this helps

Thank you for formally accepting my answer

Greg.