K-means clustering - results and plotting a continuous curve

Question

Rayne 2015 年 9 月 22 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/244511-k-means-clustering-results-and-plotting-a-continuous-curve

コメント済み: Rayne 2015 年 9 月 25 日

I am very new to Matlab, and I'm trying to classify some data using K-means. This is what I have:

numClusters = 4;
idx_1 = kmeans([X_1 smoothY_1],numClusters,'Replicates', 5);
[numDataPoints,numDimensions] = size(smoothY_1);
Colors = hsv(numClusters);
for i = 1 : numDataPoints
    plot(X_1(i),smoothY_1(i),'.','Color',Colors(idx_1(i),:))
    hold on
end

The output I got was

I realized that it seems as if what the K-means clustering did was simply divide the graph into numClusters segments and that's it. I've tried with different values of numClusters and each gave me equally divided segments. Surely this can't be right?

Another question I have is about plotting the results. Both X_1 and smoothY_1 are "1825x1 double" arrays. I'm trying to plot a continuous curve, but I only have output if I use '.' in the LineSpec. Using '-' will not give me any output. How do I plot a continuous curve?

Thank you.

ETA: I have plotted the graph in line mode thanks to @Hamoon.

There are actually 3 data sets that I'm trying to cluster using K-means.

They were all generated from the same system and consists of 4 distinct operational states. It doesn't seem right to me that the 4 states are all equally divided segments. I thought it is more likely that the long segment after the biggest spike belongs to 1 cluster, rather than 3 different clusters.

Is there any clustering algorithm I should use? Or do I need to do some pre-processing before I use K-means, like perform K-means based on the difference between adjacent points, rather than on the X,Y points themselves?

Thank you.

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

サインインしてコメントする。

サインインしてこの質問に回答する。

Answer 1

Hamoon 2015 年 9 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/244511-k-means-clustering-results-and-plotting-a-continuous-curve#answer_193353

MATLAB Online で開く

1. K-means is a clustering method, it's NOT a classification algorithm, but the way you can then use its output for association. What kind of output do you expect? If you are not happy with this output you probably don't want a clustering method.

2. you are plotting the points one by one, so '-' doesn't give you what you want, you can use this:

 for i = 1 : numClusters
     idxThis = idx_1==i;
     plot(X_1(idxThis),smoothY_1(idxThis),'-','Color',Colors(i,:))  % It also works without '-'   
     hold on                                                         
 end                                                                
 axis([0 1800 0 15])

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

Rayne 2015 年 9 月 23 日

MATLAB Online で開く

Thanks, the normalization makes sense. I've tried that, and it does take me a step closer to what I want, but not yet.

So after normalization, I plotted the original X_1 and Y_1 values (because I wanted to maintain the original curve), and got the top graph below. Checking my idxThis array, it seems like it clustered 2 disjoint sets of points as a group and was trying to plot them as a continuous curve, hence the weird graph. Changing the LineSpec from '-' to '.' showed the results clearer.

Ok, so what I think my clustering should result in, is the following.

The third graph in my original post (after the ETA) should be used as the training set as it is the "cleanest". I think I should have the following groupings:

Cluster 1: X in [0,100]
Cluster 2: X in [100,500]
Cluster 3: X in [500,600]
Cluster 4: X in [600,end]

As you can see, I think the groupings should be continuous and not disjoint like what I'm getting now, because the system goes from one state to another sequentially. And the clustering is based on the change in shape - once there is a "harsh" transition, the system enters a new state. How can I use K-means to get this kind of clustering?

Another question: right now, I'm simply running K-means on each of the 3 data sets I have independently. How can I use the third set as the training set and the other 2 sets as the testing set?

Kirby Fears 2015 年 9 月 24 日

MATLAB Online で開く

Don't know enough about those algorithms to help. You'd probably need the toolbox.

I tested k-means using moments of the distribution to try identifying different modes. Pasting it here in case it works for you or helps at all.

%%setting up data with different distributions
mode{1}=2*randn(1,400)-0.5;
mode{2}=4*randn(1,400);
mode{3}=2*randn(1,400)+0.5;
mode{4}=0.5*randn(1,400)+1;
Y1=[mode{1} mode{2} mode{3} mode{4}];
clear mode;
X1=1:numel(Y1);
%%clustering
numClusters=4;
windowsize=16;
[mu, sig]=deal(NaN(1,numel(Y1)));
for iter=windowsize+1:numel(Y1),
    mu(iter) = mean(Y1(iter-windowsize:iter));
    sig(iter) = std(Y1(iter-windowsize:iter));
end
mu(1:windowsize)=mu(windowsize+1);
sig(1:windowsize)=sig(windowsize+1);
% Try combinations of X1, sig, and mu for clustering
idx1=kmeans(zscore([X1' sig']),numClusters,'Replicate',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
% plot index to see cluster assignments
figure(2);
plot(idx1);
% plot colored clusters
figure(1);
for j=1:numClusters,
    plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'.','Color',colors(j,:));
    if j==1,
        hold on;
    end;
end,
hold off;

Rayne 2015 年 9 月 25 日

Thank you very much for your help. I eventually gave SVM a try and it seems to give me results that I'm rather satisfied with.

サインインしてコメントする。

Answer 2

Kirby Fears 2015 年 9 月 22 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/244511-k-means-clustering-results-and-plotting-a-continuous-curve#answer_193356

編集済み: Kirby Fears 2015 年 9 月 22 日

MATLAB Online で開く

kmeans is working exactly as expected for the input you're providing. The best 4 centroids are along your line. Perhaps you can review the wiki page to see why.

Your code calls the plot() function for each point separately. I made a few changes so you can call the plot function only once per cluster, and it plots in line mode as requested:

X1=(1:1825)';
Y1=randn(1825,1);
numClusters=4;
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);  
for j=1:numClusters,
    plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
    if j==1,
        hold on;
    end;
end,
hold off;

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

Kirby Fears 2015 年 9 月 22 日

Rayne,

Is X1 a time variable, and are you trying to cluster with respect to time as well?

If you're exclusively trying to cluster the "Y1" variable, you could try using kmeans with Y1 as input instead of [X1 Y1].

Since this is a one-dimensional clustering, it will simply group the Y1 values into 4 ranges: high, med-high, med-low, low.

To determine what method you'd like for categorization or clustering, you need to first be very precise about what values you want to categorize or cluster.

Rayne 2015 年 9 月 23 日

Yes, X_1 is a time variable, and I'm trying to cluster the (X,Y) points. In fact, I had tried just using K-means on the Y points, and saw the clustering like you said, which isn't what I wanted.

I have replied to @Hamoon on what I think the clustering results should be.

サインインしてコメントする。

K-means clustering - results and plotting a continuous curve

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

K-means clustering - results and plotting a continuous curve

0 件のコメント -2 件の古いコメントを表示-2 件の古いコメントを非表示

回答 (2 件)

8 件のコメント 6 件の古いコメントを表示6 件の古いコメントを非表示

3 件のコメント 1 件の古いコメントを表示1 件の古いコメントを非表示

参考

カテゴリ

タグ

Community Treasure Hunt

0 件のコメント
-2 件の古いコメントを表示-2 件の古いコメントを非表示

8 件のコメント
6 件の古いコメントを表示6 件の古いコメントを非表示

3 件のコメント
1 件の古いコメントを表示1 件の古いコメントを非表示