Question about kmeans centroid

hi, i have a quick question about kmeans.
i randomly generated 1,000 number in the range of (0,1) and clustered them into 20.
however, i found the mean of each cluster is slightly different from their centroid. Why? By definition, they should be the same, right?
thanks.

1 件のコメント

Walter Roberson
Walter Roberson 2012 年 7 月 20 日
mean versus median ?

サインインしてコメントする。

 採用された回答

Star Strider
Star Strider 2012 年 7 月 20 日
編集済み: Star Strider 2012 年 7 月 20 日

0 投票

I wouldn't expect them to be the same. The mean is a probability measure (the ‘expected value’ of the set) and is a linear function of the individual probabilities of the members of the set. The centroid minimizes the Euclidean (or other metric) distance between itself and the members of the set, and is not specifically a probability measure.
The ‘cityblock’ metric might approximate the mean, but there is no reason to expect any metric based on a quadratic or other nonlinear function to do so.

1 件のコメント

Star Strider
Star Strider 2012 年 7 月 20 日
Thank you for accepting my answer!

サインインしてコメントする。

その他の回答 (2 件)

Peter Perkins
Peter Perkins 2012 年 7 月 20 日

0 投票

Rebecca, are you seeing something like this?
>> x = rand(1000,1);
>> [idx,c] = kmeans(x,20);
>> c2 = grpstats(x,idx,@mean);
>> c - c2
ans =
0
0
-1.38777878078145e-17
0
0
0
0
0
0
-1.38777878078145e-17
0
0
0
-2.77555756156289e-17
0
0
-5.55111512312578e-17
0
0
0
That is to be expected, the differences are due to different rounding errors. Consider this:
>> x = rand(1000,1);
>> ( sum(x) - sum(x(randperm(length(x)))) ) / sum(x)
ans =
-7.87959181618481e-16
which is because the sums are in different order. Same idea.
If you're seeing something else, you;ll have to provide more info. Hope this helps.
rebecca
rebecca 2012 年 7 月 20 日

0 投票

thank you both

タグ

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by