kmeans clustering of matrices

10 ビュー (過去 30 日間)

Susan 2021 年 6 月 4 日

0
リンク

この質問への直接リンク

https://jp.mathworks.com/matlabcentral/answers/848085-kmeans-clustering-of-matrices

コメント済み: Susan 2021 年 6 月 7 日

Hi All,

I have 12X190 cells. Each cell contains a complex matrix of size n*550 (assuming each row is an observation on 550 variables. The number of observations varies cell to cell but the variables are the same for each matrix). I need to classify these matrices using kmeans and I am trying to cluster the large matrix (i.e., 12*190*n*550 and I am not working with each matrix separately).

Any idea how I can do that? Any method better than kmeans to cluster these data? Any input would be appreciated.

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

Susan 2021 年 6 月 4 日

編集済み: Susan 2021 年 6 月 4 日

Thank you so much for your response.

You're right. "I just want to make one large (but 2-dimensional) array, by concatenating each individual matrix such that I have a (sum of all the individual n values from the 12-by-190 smaller matrices)-by-550 matrix, with lots and lots of observations, but still 550 features".

As you mentioned, I want to cluster matrices not observations. For each matrix in the 12-by-190 = 2280 cell array I have one label from 0 to 10. (in most examples that I've seen so far usually we have a lable which is assigned to either a scalar or a vector, but here a lable is assigned to a matrix). Each of these cell arrays is the output of one expriement and we got bunch of them by changing some parameters, so I think we can consider them somehow as observations. So I have 2280 cells each contains a n*p matrix and a 1-by-2280 vector which contains the label.

My aim is to see if the matrices with the same label can be clustered to gether or not.

And later, when I have an unseen input matrix (n-by-550) I can find which cluster this matrix is belonged to and somehow predict the label.

Moreover, I'm interested in figuring out which of these features p are more impactful and which one I can get rid off.

Please let me know if you need more detailts to be able to help. Thanks!

the cyclist 2021 年 6 月 4 日

Wish you had mentioned the labels earlier. :-)

OK, so each matrix is the result of an experiment. And each experiment results in n measurements of 550 features. (The value of n can vary for each experiment.) Each experiment also results in a label.

Then, given a new matrix (with unknown label), you want to assign the correct label.

The major stumbling block (at least in my mind) here is that your measured variables are features of the observations, not of the matrices. If you want to predict the label of an unseen matrix, you need features of the matrices. Presumably you can build features of the matrices from the features of the observations, but I'm not sure how that would work. (Specifically, I don't see how k-means helps.)

I think I would try to simplify this, to really sort out the specifics of how to do this. For example:

imagine you have the same n for all matrices (and imagine it is small, like 5)
instead of 550 feature, suppose you only have 3
instead of 12x190 matrices, just fix that number to something like 10
instead of 11 labels, maybe just 2 or three

Then really think through what you really mean by "some matrices are more similar to each other, and therefore should have the same label". That thinking might help you see the proper mathematical method for getting there.

Image Analyst 2021 年 6 月 5 日

OK, so you're just going to consider the real part of the complex numbers. So, how many clusters do you believe there to be? What did you put in for k (if you put in anything)? Do you think there are 3 clusters? 6? 100? Or no idea?

https://www.mathworks.com/discovery/machine-learning-models.html?s_eid=psm_dl&source=15308

Susan 2021 年 6 月 5 日

@Image Analyst There would be 19 cluster

サインインしてコメントする。

サインインしてこの質問に回答する。

採用された回答

Walter Roberson 2021 年 6 月 5 日

0
リンク

この回答への直接リンク

https://jp.mathworks.com/matlabcentral/answers/848085-kmeans-clustering-of-matrices#answer_717020

k-means is not the right technology for situations in which you have labels, except for the situation in which the labels have numeric values that can be made commensurate with the numberic coordinates. For example if you can say that having a label differ by no more than 1 is 10.28 times as important as having column 3 differ by 1, then you might be able to use k-means by adding the numeric value of the label as an additional coordinate. But this is not the usual case.

When you have matrices of numbers and a label associated with the matrix, then Deep Learning or (Shallow) Neural Network techniques are more appropriate. Consider that if you have a matrix of data and a label, and the matrices are all the same size, that that situation could be treated the same was as if the matrix of data were an "image"

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

Walter Roberson 2021 年 6 月 7 日

Yes! This is expected, and is a fundamental challenge of this kind of learning: to determine the best subset of data to train on for the highest accuracy and lowest over-training.

k-fold cross validation is indeed one of the techniques that is used. It will reduce the variation you see, but do expect that there will still be some variation depending on the random choice.

Susan 2021 年 6 月 7 日

Thanks!

サインインしてコメントする。

その他の回答 (0 件)

サインインしてこの質問に回答する。

カテゴリ

AI and Statistics Statistics and Machine Learning Toolbox Cluster Analysis and Anomaly Detection k-Means and k-Medoids Clustering

Help Center および File Exchange で k-Means and k-Medoids Clustering についてさらに検索

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

kmeans clustering of matrices

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

採用された回答

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

kmeans clustering of matrices

11 件のコメント 9 件の古いコメントを表示9 件の古いコメントを非表示

採用された回答

5 件のコメント 3 件の古いコメントを表示3 件の古いコメントを非表示

その他の回答 (0 件)

参考

カテゴリ

タグ

Community Treasure Hunt

11 件のコメント
9 件の古いコメントを表示9 件の古いコメントを非表示

5 件のコメント
3 件の古いコメントを表示3 件の古いコメントを非表示