Trouble with k means

Question

Matt 2011 年 9 月 12 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/15685-trouble-with-k-means

閉鎖済み: MATLAB Answer Bot 2021 年 8 月 20 日

I'm having trouble tweaking the k means options as shown here: http://www.mathworks.co.uk/help/toolbox/stats/kmeans.html

I'm very new to cluster analysis so I'm not sure if I'm even using the correct technique.

The data I have is below along with the output that I am getting and my desired output. I've tried all sorts of things but rather than getting my desired output I get many errors:

Data: A(:,1) = [39728 39757 39771 39799 39841 39855 39897 39919 39946 39973 40008 40037 40064 40079 40128 40142 40155 40205 40233 40261 40281 40310 40352 40372 40401 40428 40463 40519 40534 ]

A(:,2) = [2.2 2.2 2 2 2 1.3 1.3 1.3 1.3 1.3 1.4 1.4 1.5 1.4 1.4 1.4 1.5 1.5 1.9 2.1 1.8 2 2.1 2.1 2.1 2.1 2.1 2 2.1]

[idx ctrs]=kmeans(A,3)

Actual idx = [2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ]

Desired idx = [2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 1 1 1 1 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 ]

I use the following code to visualise the above outputs:

gscatter (TEMPDataset(:,1),TEMPDataset(:,2),idx)

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

この質問は閉じられています。

Answer 1

Peter Perkins 2011 年 9 月 12 日

1
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/15685-trouble-with-k-means#answer_21306

Your desired result can't possibly be what you really want, it's not even the right length.

I'm guessing you made the plot using gscatter and wondered why the "obvious" clusters in your data aren't what kmeans returned. But look at the scaling of your two data columns: the range of the first one is four orders of magnitude larger than the second. By having such poorly scaled variables, you are in effect ignoring the second one completely, and you can see that the clusters that kmeans finds are in fact the same as if you had used just the first column of A.

Run A through something like perhaps zscore to standardize it, and you'll get what I suspect you're looking for.

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

Oleg Komarov 2011 年 9 月 12 日

Always learning from Peter! +1

Answer 2

Oleg Komarov 2011 年 9 月 12 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/15685-trouble-with-k-means#answer_21280

MATLAB Online で開く

Kmeans identifies the clusters by minimizing the distance of the points from the identified cluster.

What you see as cluster may not correspond to the minizing criterion. I think you want to group points whenever they are close one to another and if a break occurs start a new cluster, then kmeans may not be suitable.

EDIT

Incorporating Peter's suggestion:

[idx ctrs] = kmeans(A,3);
figure('pos',[100,100,800,600])
subplot(211)
hold on
plot(A(idx == 1,1),A(idx == 1,2),'r.','MarkerSize',12)
plot(A(idx == 2,1),A(idx == 2,2),'b.','MarkerSize',12)
plot(A(idx == 3,1),A(idx == 3,2),'g.','MarkerSize',12)
plot(ctrs(:,1),ctrs(:,2),'kx', 'MarkerSize',12,'LineWidth',2)

% Normalized A

subplot(212)
A = zscore(A);
[idx ctrs] = kmeans(A,3);
hold on
plot(A(idx == 1,1),A(idx == 1,2),'r.','MarkerSize',12)
plot(A(idx == 2,1),A(idx == 2,2),'b.','MarkerSize',12)
plot(A(idx == 3,1),A(idx == 3,2),'g.','MarkerSize',12)
plot(ctrs(:,1),ctrs(:,2),'kx', 'MarkerSize',12,'LineWidth',2)

Trouble with k means

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

タグ

Community Treasure Hunt

Trouble with k means

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

1 件のコメント -1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

参考

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

1 件のコメント
-1 件の古いコメントを表示-1 件の古いコメントを非表示

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示