Why Kmeans function give us give different answer?

Question

0 投票

I have noticed that kmeans function for one k value in a single run gives different cluster indices than while using in a loop with varying k say from 2:N. I do not understand this. It will be great if it is clear to me.

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Follow Question

Answer 1

José-Luis 2014 年 9 月 22 日

1 投票

Because, if you are using the default settings, kmeans() randomly selects a starting point. The algorithm is not deterministic and the results might depend on that starting position.

2 件のコメント
なしを表示なしを非表示

Mahesh 2014 年 9 月 22 日

MATLAB Online で開く

So what is the default setting then i have chosen:

rng('default');

Am I right?

Adam Filion 2014 年 9 月 22 日

MATLAB Online で開く

Try using the 'replicates' option for kmeans to automatically run the algorithm multiple times and return the best answer:

>> doc kmeans

You can set the order of random numbers generated with the rng command:

>> doc rng

Putting something like rng(3) before kmeans will make the results repeatable even though it involves random starting points.

サインインしてコメントする。

Answer 2

Image Analyst 2014 年 9 月 22 日

0 投票

http://www.mathworks.com/help/stats/k-means-clustering.html

Like many other types of numerical minimizations, the solution that kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local minimum, where reassigning any one point to a new cluster would increase the total sum of point-to-centroid distances, but where a better solution does exist. However, you can use the optional 'replicates' parameter to overcome that problem.

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

Mahesh 2014 年 9 月 22 日

MATLAB Online で開く

Yes I do understand. However, I got different answer while it is single value of cluster like

      [idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
          'replicates',8, 'display','iter');

and others inside loop like

rng('default');  % For reproducibility
param_sac = load('param2W_sac.cld');
size(param_sac);
dist_alg = 'sqEuclidean';
iditer = [];
sumdistitr = [];
meansil = [];
silhitr = [];
for nkmeans = 1:10;
    [idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
        'replicates',nkmeans, 'display','iter');
    [silh,h] = silhouette(param_sac,idx);
    xlabel('Silhouette Value')
    ylabel('Cluster');
    meanh = mean(silh);
    iditer = [iditer idx];
%     cen = [cen cent];
%     sumdistitr = [sumdistitr sumdist];
    meansil = [meansil; nkmeans meanh];
    silhitr = [silhitr silh];    
end

I got totally different in classification.

Thanks for responses to all

サインインしてコメントする。

Why Kmeans function give us give different answer?

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント
なしを表示なしを非表示

その他の回答 (1 件)

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

タグ

Community Treasure Hunt

Why Kmeans function give us give different answer?

0 件のコメント -2 件の古いコメントを表示 -2 件の古いコメントを非表示

採用された回答

2 件のコメント なしを表示 なしを非表示

その他の回答 (1 件)

1 件のコメント -1 件の古いコメントを表示 -1 件の古いコメントを非表示

カテゴリ

製品

タグ

参考

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示 -2 件の古いコメントを非表示

2 件のコメント
なしを表示なしを非表示

1 件のコメント
-1 件の古いコメントを表示 -1 件の古いコメントを非表示