What does sumd method in k-means clustering function exactly calculate?
2 ビュー (過去 30 日間)
古いコメントを表示
I am doing basic experiments with kmeans function. As a real simple example, say that I have a data set of 4 items with 1 attribute and this attribute is their value:
Data=[1;2;3;4];
If I want to split this data set into 2 clusters I should get one centroid in 1.5 and another in 3.5:
[idx,C,sumd]=kmeans(Data,2)
C =
1.5000
3.5000
and I get it. However to my understanding sumd in this case should be:
abs(1-1.5)+abs(2-1.5) or abs(3-3.5)+abs(4-3.5)
ans =
1
but I am getting sumd as:
sumd =
0.5000
0.5000
for both clusters. Instead of getting 1's for both.
My question is what exactly does sumd calculate?
0 件のコメント
採用された回答
Ameer Hamza
2018 年 5 月 8 日
編集済み: Ameer Hamza
2018 年 5 月 8 日
If you look at the documentation of kmeans(), you will know that it uses the square of the Euclidean distance, by default. So you should calculate it like this
abs(1-1.5).^2+abs(2-1.5).^2 or abs(3-3.5).^2+abs(4-3.5).^2
ans =
0.5 (both cases)
その他の回答 (1 件)
the cyclist
2018 年 5 月 8 日
It's because the default distance metric used is the squared Euclidean distance (for minimization, and reporting). See the Distance input parameter.
参考
カテゴリ
Help Center および File Exchange で Statistics and Machine Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!