clusterdata
Construct agglomerative clusters from data
Syntax
Description
returns cluster indices for each observation (row) of an input data matrix
T
= clusterdata(X
,Cutoff=cutoff
)X
, given a threshold cutoff
for cutting an
agglomerative hierarchical tree generated by the linkage
function from X
.
clusterdata
supports agglomerative clustering and incorporates
the pdist
, linkage
, and
cluster
functions, which you can use
separately for more detailed analysis. See Algorithm Description for more details.
specifies options using one or more name-value arguments in addition to any of the input
argument combinations in the previous syntaxes. For example, specify
T
= clusterdata(___,Name=Value
)clusterdata(X,MaxClust=5,Depth=3)
to find a maximum of five clusters
by evaluating distance values up to a depth of three below each node.
Examples
Input Arguments
Name-Value Arguments
Output Arguments
Tips
If
Linkage
is"centroid"
or"median"
, thenlinkage
can produce a cluster tree that is not monotonic. This result occurs when the distance from the union of two clusters, r and s, to a third cluster is less than the distance between r and s. In this case, in a dendrogram drawn with the default orientation, the path from a leaf to the root node takes some downward steps. To avoid this result, specify another value forLinkage
. The following image shows a nonmonotonic cluster tree.In this case, cluster 1 and cluster 3 are joined into a new cluster, while the distance between this new cluster and cluster 2 is less than the distance between cluster 1 and cluster 3.
Algorithms
When you do not specify any optional name-value arguments, the
clusterdata
function performs the following steps:
Create a vector of the Euclidean distance between pairs of observations in
X
by usingpdist
.Y =
pdist
(X
,"euclidean")Create an agglomerative hierarchical cluster tree from
Y
by usinglinkage
with the"single"
method for computing the shortest distance between clusters.Z =
linkage
(Y,"single")When you specify
cutoff
, theclusterdata
function usescluster
to define clusters fromZ
when inconsistent values are less thancutoff
.T
=cluster
(Z,Cutoff=cutoff)When you specify
maxclust
, theclusterdata
function usescluster
to find a maximum ofmaxclust
clusters fromZ
, using"distance"
as the criterion for defining clusters.T
= cluster(Z,MaxClust=maxclust)
Alternative Functionality
If you have a hierarchical cluster tree Z
(the output of the linkage
function for the input data matrix X
), you can use
cluster
to perform agglomerative clustering on Z
and return
the cluster assignment for each observation (row) in X
.