K-means clustering - results and plotting a continuous curve
古いコメントを表示
I am very new to Matlab, and I'm trying to classify some data using K-means. This is what I have:
numClusters = 4;
idx_1 = kmeans([X_1 smoothY_1],numClusters,'Replicates', 5);
[numDataPoints,numDimensions] = size(smoothY_1);
Colors = hsv(numClusters);
for i = 1 : numDataPoints
plot(X_1(i),smoothY_1(i),'.','Color',Colors(idx_1(i),:))
hold on
end
The output I got was

I realized that it seems as if what the K-means clustering did was simply divide the graph into numClusters segments and that's it. I've tried with different values of numClusters and each gave me equally divided segments. Surely this can't be right?
Another question I have is about plotting the results. Both X_1 and smoothY_1 are "1825x1 double" arrays. I'm trying to plot a continuous curve, but I only have output if I use '.' in the LineSpec. Using '-' will not give me any output. How do I plot a continuous curve?
Thank you.
ETA: I have plotted the graph in line mode thanks to @Hamoon.
There are actually 3 data sets that I'm trying to cluster using K-means.

They were all generated from the same system and consists of 4 distinct operational states. It doesn't seem right to me that the 4 states are all equally divided segments. I thought it is more likely that the long segment after the biggest spike belongs to 1 cluster, rather than 3 different clusters.
Is there any clustering algorithm I should use? Or do I need to do some pre-processing before I use K-means, like perform K-means based on the difference between adjacent points, rather than on the X,Y points themselves?
Thank you.
回答 (2 件)
Hamoon
2015 年 9 月 22 日
1. K-means is a clustering method, it's NOT a classification algorithm, but the way you can then use its output for association. What kind of output do you expect? If you are not happy with this output you probably don't want a clustering method.
2. you are plotting the points one by one, so '-' doesn't give you what you want, you can use this:
for i = 1 : numClusters
idxThis = idx_1==i;
plot(X_1(idxThis),smoothY_1(idxThis),'-','Color',Colors(i,:)) % It also works without '-'
hold on
end
axis([0 1800 0 15])
8 件のコメント
Rayne
2015 年 9 月 22 日
Hamoon
2015 年 9 月 22 日
That's because your x is changing from 0 to 1800, but y is changing in the range of 0 to 15, as k-means works based on distances (Here you are using Squared Euclidean distance) the effect of x would be more than y. For example point (600,5) is closer to (601,15) than (700,5), so (600,5) and (600,15) will be considered to be in the same cluster. If you want to decrease this effect, you need to normalize your data.
xNew = (x - mean(x))./std(x);
yNew = (y - mean(y))./std(y);
then the effect of changes in both x and y can affect the output.
Hamoon
2015 年 9 月 22 日
And Also as Kirby Fears said if you just want to perform a clustering based on Y data then just pass Y data to the K-means function, then you will have these four colors and they changes vertically.
Rayne
2015 年 9 月 23 日
Kirby Fears
2015 年 9 月 23 日
編集済み: Kirby Fears
2015 年 9 月 23 日
It sounds like you're trying to segment your time series according to changes in the distribution of the series (e.g. mean and standard deviation categories). You could use trailing mean and std measurements then try to categorize those (mean, std) pairs using k-means.
However, the only "training" you can do with k-means is to use the previously established centroids as initial values for your new dataset. This would only serve to produce faster convergence or help the k-means algorithm converge to the "right" answer on datasets that have multiple convergence sets (based on starting values).
I don't think your problem is well-suited to k-means. If I think of a better method, I'll post it later.
Rayne
2015 年 9 月 24 日
Kirby Fears
2015 年 9 月 24 日
Don't know enough about those algorithms to help. You'd probably need the toolbox.
I tested k-means using moments of the distribution to try identifying different modes. Pasting it here in case it works for you or helps at all.
%%setting up data with different distributions
mode{1}=2*randn(1,400)-0.5;
mode{2}=4*randn(1,400);
mode{3}=2*randn(1,400)+0.5;
mode{4}=0.5*randn(1,400)+1;
Y1=[mode{1} mode{2} mode{3} mode{4}];
clear mode;
X1=1:numel(Y1);
%%clustering
numClusters=4;
windowsize=16;
[mu, sig]=deal(NaN(1,numel(Y1)));
for iter=windowsize+1:numel(Y1),
mu(iter) = mean(Y1(iter-windowsize:iter));
sig(iter) = std(Y1(iter-windowsize:iter));
end
mu(1:windowsize)=mu(windowsize+1);
sig(1:windowsize)=sig(windowsize+1);
% Try combinations of X1, sig, and mu for clustering
idx1=kmeans(zscore([X1' sig']),numClusters,'Replicate',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
% plot index to see cluster assignments
figure(2);
plot(idx1);
% plot colored clusters
figure(1);
for j=1:numClusters,
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'.','Color',colors(j,:));
if j==1,
hold on;
end;
end,
hold off;
Rayne
2015 年 9 月 25 日
Kirby Fears
2015 年 9 月 22 日
編集済み: Kirby Fears
2015 年 9 月 22 日
kmeans is working exactly as expected for the input you're providing. The best 4 centroids are along your line. Perhaps you can review the wiki page to see why.
Your code calls the plot() function for each point separately. I made a few changes so you can call the plot function only once per cluster, and it plots in line mode as requested:
X1=(1:1825)';
Y1=randn(1825,1);
numClusters=4;
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
for j=1:numClusters,
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
if j==1,
hold on;
end;
end,
hold off;
3 件のコメント
Rayne
2015 年 9 月 22 日
Kirby Fears
2015 年 9 月 22 日
Rayne,
Is X1 a time variable, and are you trying to cluster with respect to time as well?
If you're exclusively trying to cluster the "Y1" variable, you could try using kmeans with Y1 as input instead of [X1 Y1].
Since this is a one-dimensional clustering, it will simply group the Y1 values into 4 ranges: high, med-high, med-low, low.
To determine what method you'd like for categorization or clustering, you need to first be very precise about what values you want to categorize or cluster.
Rayne
2015 年 9 月 23 日
カテゴリ
ヘルプ センター および File Exchange で k-Means and k-Medoids Clustering についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
