How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

3 ビュー (過去 30 日間)
jean young
jean young 2011 年 2 月 24 日
I’m annoyed with the uncertainty of the processing result of my MATLAB program. My codes are as follows.
%-----------------------------
clear all; close all;
a = [0.3948 0.4644 0.4412 0.6270 0.6270 0.1626];
[idx c] = kmeans(a,2)
rate = c(1)/c(2)
%-----------------------------
I ran this program several times and found the results were quite interesting. Although the data set to be processed was determinate, the processing results could be different each time. I found there were at least four groups of answers.
%-----------------------------
idx = 1 1 1 2 2 1 c = 0.3658 0.6270 rate = 0.5833
idx = 1 1 1 1 1 2 c = 0.5109 0.1626 rate = 3.1419
idx = 2 2 2 1 1 2 c = 0.6270 0.3658 rate = 1.7143
idx = 2 2 2 2 2 1 c = 0.1626 0.5109 rate = 0.3183
%-----------------------------
Can anybody help me on how to avoid this uncertainty? BTW, my MATLAB version is R2008a.
Thank you in advance for any response.
Best regards,
Jean

採用された回答

Mahmoud Hammoud
Mahmoud Hammoud 2011 年 2 月 24 日
This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., idx is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:
[idx c] = kmeans(a,2,'start',[0 0.5]')
This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the partition with the lowest sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances.
  1 件のコメント
jean young
jean young 2011 年 2 月 25 日
Thank you very much! I have modified my program and the problem is solved.

サインインしてコメントする。

その他の回答 (0 件)

カテゴリ

Help Center および File ExchangeRandom Number Generation についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by