How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

Question

jean young 2011 年 2 月 24 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1921-how-to-avoid-uncertainty-in-processing-result-of-matlab-statistics-toolbox

採用された回答: Mahmoud Hammoud

I’m annoyed with the uncertainty of the processing result of my MATLAB program. My codes are as follows.

%-----------------------------

clear all; close all;

a = [0.3948 0.4644 0.4412 0.6270 0.6270 0.1626];

[idx c] = kmeans(a,2)

rate = c(1)/c(2)

%-----------------------------

I ran this program several times and found the results were quite interesting. Although the data set to be processed was determinate, the processing results could be different each time. I found there were at least four groups of answers.

%-----------------------------

idx = 1 1 1 2 2 1 c = 0.3658 0.6270 rate = 0.5833

idx = 1 1 1 1 1 2 c = 0.5109 0.1626 rate = 3.1419

idx = 2 2 2 1 1 2 c = 0.6270 0.3658 rate = 1.7143

idx = 2 2 2 2 2 1 c = 0.1626 0.5109 rate = 0.3183

%-----------------------------

Can anybody help me on how to avoid this uncertainty? BTW, my MATLAB version is R2008a.

Thank you in advance for any response.

Best regards,

Jean

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Mahmoud Hammoud 2011 年 2 月 24 日

3
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/1921-how-to-avoid-uncertainty-in-processing-result-of-matlab-statistics-toolbox#answer_2873

This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., idx is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:

[idx c] = kmeans(a,2,'start',[0 0.5]')

This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the partition with the lowest sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

jean young 2011 年 2 月 25 日

Thank you very much! I have modified my program and the problem is solved.

サインインしてコメントする。

How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

採用された回答

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示