フィルターのクリア

K-means Clustering

2 ビュー (過去 30 日間)
Avishek Dutta
Avishek Dutta 2012 年 6 月 7 日
Hi, I am trying to do a un-supervised classification on IMU data.
I have a raw data file of variables like accleration(x,y,z), gyro(x,y,z) etc (in total 15 coloums).
When I apply
opts = statset('Display','final');
idx = kmeans(DataUnsup,NoOfClassesUnsup,'Distance','correlation','Replicates',NoOfIterations,'Options',opts);
[sizeFullData,~] = size(DataUnsup);
T=1:sizeFullData;
plot(T',idx,'g+');
I am NOT getting a good clustering. This is raw data which is a file containing data of all movements like walk, run, sit, stand etc recoreded together in one go. There is no information of the grouping or order of the data. DataUnsup usually consists of a mixture of the different 15 variables, if not all.
Can someone please guide me?
Avishek
P.S. sqeuclidean, cosine etc are also not working.

回答 (2 件)

Walter Roberson
Walter Roberson 2012 年 6 月 7 日
What you write suggests that kmeans is not a good classifier for you to use.
  2 件のコメント
Avishek Dutta
Avishek Dutta 2012 年 6 月 7 日
Agreed, after a week long of manipulating data, re-recording etc I have reached the same conclusion.
Does anyone has similiar expereinces?
Any suggestions as to which mechanism to choose?
Thanks & Regards
Walter Roberson
Walter Roberson 2012 年 6 月 7 日
Did PCA find anything interesting?

サインインしてコメントする。


Peter Perkins
Peter Perkins 2012 年 6 月 7 日
Avishek, it's not clear what you mean by "NOT getting a good clustering". If I understand your code correctly, you are plotting the cluster number of each row in the data vs. its row number. Unless the data are already in a special order, there's no reason why you would expect to see anything other than a big jumble of points along discrete horizontal lines. Perhaps you are seeing one big jumble and a bunch of (near) singletons and observing that you have no useful clusters. Perhaps your description was intended to mean that the data are in some special order, and you are just testing whether or not kmeans can recreate it. I can't tell.
Two things:
  • The silhouette function may prove useful to visualize whether or not kmeans found "good" clusters.
  • You're using correlation distance which will only be useful for a very particular kind of data. I don't know anything about your data, so you may have a good reason for using correlation. You do say you tried squared euclidean and cosine distance, so perhaps correlation distance was just your last try at getting something to work.
Hope this helps.
  2 件のコメント
Avishek Dutta
Avishek Dutta 2012 年 6 月 8 日
You are correct in all your assumptions. As I said, this data has a speciality. Two IMU sensors recording movement (arm & leg) in 7 different scenarios (WALK, RUN, SIT and so on) together in ONE GO.
Infact I have 4 such data files. Arm_120Hz , Arm_10Hz, Leg_120Hz & Leg_10Hz.
From none of them is KMEANS able to separate the scenario clusters. What I see is always something like this,
7 +++++++++++ +++ +++ +++++
6 ++++++++ ++ ++++
5 ++++++
4 +++ +
3
2 +++++++
1 ++++++
"+" are the data points, 7 is no. of classes.
I never get a compact cluster. Does this explain the problem a bit clearly?
Will try the silhouette function.
Thanks
Avishek Dutta
Avishek Dutta 2012 年 6 月 8 日
Sorry, the editor removed the spaces i put to show the unclustered points. Please ignore the sketch.

サインインしてコメントする。

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by