Why Kmeans function give us give different answer?

I have noticed that kmeans function for one k value in a single run gives different cluster indices than while using in a loop with varying k say from 2:N. I do not understand this. It will be great if it is clear to me.

 採用された回答

José-Luis
José-Luis 2014 年 9 月 22 日

1 投票

Because, if you are using the default settings, kmeans() randomly selects a starting point. The algorithm is not deterministic and the results might depend on that starting position.

2 件のコメント

Mahesh
Mahesh 2014 年 9 月 22 日
So what is the default setting then i have chosen:
rng('default');
Am I right?
Adam Filion
Adam Filion 2014 年 9 月 22 日
Try using the 'replicates' option for kmeans to automatically run the algorithm multiple times and return the best answer:
>> doc kmeans
You can set the order of random numbers generated with the rng command:
>> doc rng
Putting something like rng(3) before kmeans will make the results repeatable even though it involves random starting points.

サインインしてコメントする。

その他の回答 (1 件)

Image Analyst
Image Analyst 2014 年 9 月 22 日

0 投票

Like many other types of numerical minimizations, the solution that kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local minimum, where reassigning any one point to a new cluster would increase the total sum of point-to-centroid distances, but where a better solution does exist. However, you can use the optional 'replicates' parameter to overcome that problem.

1 件のコメント

Mahesh
Mahesh 2014 年 9 月 22 日
Yes I do understand. However, I got different answer while it is single value of cluster like
[idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
'replicates',8, 'display','iter');
and others inside loop like
rng('default'); % For reproducibility
param_sac = load('param2W_sac.cld');
size(param_sac);
dist_alg = 'sqEuclidean';
iditer = [];
sumdistitr = [];
meansil = [];
silhitr = [];
for nkmeans = 1:10;
[idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
'replicates',nkmeans, 'display','iter');
[silh,h] = silhouette(param_sac,idx);
xlabel('Silhouette Value')
ylabel('Cluster');
meanh = mean(silh);
iditer = [iditer idx];
% cen = [cen cent];
% sumdistitr = [sumdistitr sumdist];
meansil = [meansil; nkmeans meanh];
silhitr = [silhitr silh];
end
I got totally different in classification.
Thanks for responses to all

サインインしてコメントする。

質問済み:

2014 年 9 月 22 日

コメント済み:

2014 年 9 月 22 日

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by