How to resolve if Validation and Testing accuracy are widely different?

26 ビュー (過去 30 日間)
Sahil Bajaj
Sahil Bajaj 2021 年 7 月 4 日
編集済み: Prince Kumar 2021 年 11 月 19 日
Dear experts,
I wrote a script in MATLAB to run my machine learning analysis (classification problem). I see a consistent but weird issue in my results (briefly I always get good/high, reproducible validation/training accuracy but my test accuracy is always too low). I checked all five tips mentioned here: https://stackoverflow.com/questions/48718663/validation-and-testing-accuracy-widely-different, but I am still unable to resolve the problem.
I would really appreciate if someone could help me in figuring out the solution.
Thanks,
Sahil

回答 (1 件)

Prince Kumar
Prince Kumar 2021 年 11 月 19 日
編集済み: Prince Kumar 2021 年 11 月 19 日
Hi Sahil Bajaj,
This generally happens when your model is learning the data instead of learning the pattern. This scenario is called 'Overfitting'.
You can try the following few things:
  • Use of regularization technique
  • Make sure each set (train, validation and test) has sufficient samples like 60%, 20%, 20% or 70%, 15%, 15% split for training, validation and test sets respectively.
  • Perform k-fold cross validation
  • Randomly shuffle the data before doing the spit, this will make sure that data distribution is nearly the same.If your data is in datastore you can use 'shuffle' function else you can use "randperm" function.

カテゴリ

Help Center および File ExchangeStatistics and Machine Learning Toolbox についてさらに検索

製品


リリース

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by