Description Usage Arguments Details Value Author(s) Examples
Bottom-up clustering in which each cluster is represented by the mean vector for observations in the cluster.
1 | eisenCluster(x, method, compatible = TRUE, verbose = FALSE)
|
x |
Data matrix, whose rows we wish to cluster |
method |
How should distance between points (and centres) be calculated? Choices include “euclidean”,“squared.euclidean”, “correlation”,“uncentered.correlation”. For “euclidean” and “squared.euclidean”, unexpected behaviour can result, since data points are replaced by their cluster centres, the overall variance in the data will decrease. |
compatible |
Flag for whether cluster merging should be done as in
Eisen's cluster algorithm. If |
verbose |
Prints iteration number if TRUE |
The main difference between this algorithm and
hclust(...,method='centroid')
is the manner in which missing
values are handled. Here, original rows are merged at each
step, taking means after omitting missing observations.
Missing values are permitted, and can be handled in the same manner as in
Eisen's package. This is perhaps the main reason the current
implementation might be used: to reproduce the clusterings found from
Eisen's code when there are missing values. When two clusters are
merged, missing values can be handled
in two ways (controlled by the compatible
flag): (1) new cluster
centres can be calculated using means of all original observations in the
clusters, or (2) new cluster centres can be calculated using a weighted
average of the means of the two clusters being joined. Although Eisen's
cluster software uses (2), it seems less desirable
in situations where observations are missing in some
dimensions only, since the presence of missing values will cause the wrong
weights to be used when updating centres.
Subsequent averaging of clusters centres will ignore the missingness
patterns in the cluster means. Option (2) is included to enable clusters
identical to Eisen's to be produced.
A hclust
object. The definition of distance between 2 clusters as
the distance between their means can result in a non-monotone dendrogram
(e.g., if A, B, C are vertices of an equilateral triangle with side lengths
1, A joins B at distance 1, then C joins AB at distance 0.866).
Non-monotone distances are coerced to be monotone before the object is
returned. This can yeild dendrograms which seem to join more than 2 points
at one height.
The “trueheight” component contains actual heights before they were forced to be monotone.
Hugh Chipman
1 2 3 4 5 6 7 8 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.