How to cluster discrete data
古いコメントを表示
Hi!
I have a database containing discrete features. For example, number of hairpinloops, number of elements, length of a sequence, the % of A nucleotides. Now I would like to apply some clustering algorithms. Does anyone know which algorithms in matlab are suited for discrete data?
Thanks a lot, Iene
回答 (1 件)
Purvaja
2025 年 2 月 5 日
There are various ways to obtain clusters. You can refer the following methods:
- K-Means clustering: The function “k-means" partitions data into k mutually exclusive clusters and returns the index of the cluster to which it assigns each observation. Requires number of clusters. (https://www.mathworks.com/help/stats/k-means-clustering.html )
[idx, C] = kmeans(data, k); % k is the number of clusters
- K-medoids Clustering: “K-medoids” is like “K-means” but is more robust to noise and outliers. Requires number of clusters too. (https://www.mathworks.com/help/stats/kmedoids.html)
[idx, C] = kmedoids(data, k); % k is the number of clusters
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): Unlike “k-means” clustering, the ”DBSCAN” algorithm does not require prior knowledge of the number of clusters. It works with distance metrics and can be applied to discrete data.(https://www.mathworks.com/help/stats/dbscan-clustering.html)
epsilon = 0.5; % Distance threshold
minPts = 5; % Minimum number of points to form a cluster
idx = dbscan(data, epsilon, minPts);
- Gaussian Mixture Models (GMM): “GMM” clustering can accommodate clusters that have different sizes and correlation structures within them.(https://www.mathworks.com/help/stats/clustering-using-gaussian-mixture-models.html)
gm = fitgmdist(data, k); % k is the number of clusters
idx = cluster(gm, data);
To check out more methods, you can refer to the following resource:
You can also access release-specific documentation using these commands in your MATLAB command window:
web(fullfile(docroot, 'stats/k-means-clustering.html'))
web(fullfile(docroot, 'stats/kmedoids.html'))
web(fullfile(docroot, 'stats/dbscan-clustering.html'))
web(fullfile(docroot, 'stats/clustering-using-gaussian-mixture-models.html'))
Hope this helps you!
カテゴリ
ヘルプ センター および File Exchange で Statistics and Machine Learning Toolbox についてさらに検索
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!