Initial centroids for K-means clustering

16 ビュー (過去 30 日間)
Salad Box
Salad Box 2019 年 9 月 16 日
コメント済み: Adam 2019 年 9 月 17 日
If I have an array (i.e., 5 by 3 matrix) can serve as the initial centroids for kmeans clustering, how can I properly initialize the kmeans algorithm?
(Matlab's kmeans function has more than 600 lines of code and I have no idea how to modify it...)
The purpose of having my own initial centroids rather than have them randomly generated in the kmeans function is to remove the randomness in the outputs.
P.s. Python has the answer to it but I don't know Python.
  1 件のコメント
Adam
Adam 2019 年 9 月 17 日
編集済み: Adam 2019 年 9 月 17 日
You should always read the documentation before the code. The 'Start' option gives you the option to input your own initial cluster centres.
I always suggest using your embedded help though via
doc kmeans
and clicking on the 'Name','Value' hyperlink in the 2nd function signature to take you to the list of possible (Name,Value) pairs that are supported. If you always use the latest version of Matlab the online help is fine though.

サインインしてコメントする。

回答 (1 件)

KALYAN ACHARJYA
KALYAN ACHARJYA 2019 年 9 月 17 日
編集済み: KALYAN ACHARJYA 2019 年 9 月 17 日
Before I share the helpful link, I requested you to watch the Andrew Ng. lecture on Random Initialization of K menas (Machine Learning).
He suggests to avoid k-means stuck in local minima or ensure the optimize K-menas, choose multiple random initailizations.
Manual Initialization
  2 件のコメント
Salad Box
Salad Box 2019 年 9 月 17 日
編集済み: Salad Box 2019 年 9 月 17 日
Thanks for your answers Kalyan. I do appreciate that.
However,
AndewNg's video only gives some help on when k-means gets stuck on local optimal. His suggestion was to use 'multiple iteration' to better find global optimal rather than local optimal based on the calculation of cost function, choosing the centroids with minimum cost function and record that centroids. That still remains my problem unsolved. If I run the k-means again with 100 new iterations, the output in most cases will be slightly different compared to the first running of k-means with initial 100 iterations.
I need to fix the issue and my request is that everytime when I run the k-means, the output needs to be the same. That's why with my prepared initial centroids, running k-means and moving centroids at each step during k-means, theoretically I should get the same output at the end. I have other variables/parameters to look at during my research, I can't let randomness in the output of k-means be one of my variable. I need to remove this randomness. Hope that is understandable.
The second link in your answer is on 'how to set initial centroids for k means'. However, I have already done that in my way. It is irrelavant to my question.
My question is:
Once I have an array as my initial centroids, how do I embed them into Matlab's own k-means function?
Hope my question is clear.
Can anyone help directly to this question please?
Adam
Adam 2019 年 9 月 17 日
As I added in a comment above, the Matlab help is always the first place to go. This shows how you can do this.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by