フィルターのクリア

how to use testing data to validate kmeans?

1 回表示 (過去 30 日間)
Mnr
Mnr 2014 年 3 月 22 日
コメント済み: Mnr 2014 年 3 月 23 日
Hello there,
I have some data in 8 text files. I would like to classify the similar ones into same classes. I am using k-means for now. I would like to have 5 of the files as training and 3 of them for testing. I have used kmeans command to have k classes, however, I do not know how to validate my results. In other words, I do not know how to use my testing data to calculate the error? I would appreciate if somebody help me. Thanks in advance.

採用された回答

Image Analyst
Image Analyst 2014 年 3 月 23 日
If you do not know the "ground truth" of your data then there's no way to tell if it's "wrong". The only thing you can do (I think) is to classify your "unknown" data and measure how far off your data are from the means of the classes. For example, let's say you had a cluster of data "class#1" around 30 +/- 5, and you had a second cluster "class#2" at 100+/-20. So you run kmeans with 2 classes and it tells you about those two classes, with the mean at 30 and 100. Now you have a data point in the "non-training" set of data and it has a value of 70. So you can say that the 65 belongs to class#2 and it's 40 from class#1 and 30 from class#2. You can do the same for all other data in your test sets.
  3 件のコメント
Image Analyst
Image Analyst 2014 年 3 月 23 日
To accurately get the error you have to know the tru e values, don't you? And you don't know those. So all you have is a guess.
Mnr
Mnr 2014 年 3 月 23 日
Thanks!

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeStatistics and Machine Learning Toolbox についてさらに検索

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by