justClusters | R Documentation |
Unsupervised clustering algorithms, such as partitioning around medoids
(pam
), K-means (kmeans
), or
hierarchical clustering (hclust
) after cutting the tree,
produce a list of class assignments along with other structure. To
simplify the interface for the BootstrapClusterTest
and
PerturbationClusterTest
, we have written these routines
that simply extract these cluster assignments.
cutHclust(data, k, method = "average", metric = "pearson")
cutPam(data, k)
cutKmeans(data, k)
cutRepeatedKmeans(data, k, nTimes)
repeatedKmeans(data, k, nTimes)
data |
A numerical data matrix |
k |
The number of classes desired from the algorithm |
method |
Any valid linkage method that can be passed to the
|
metric |
Any valid distance metric that can be passed to the
|
nTimes |
An integer; the number of times to repeat the K-means algorithm with a different random starting point |
Each of the clustering routines used here has a different
structure for storing cluster assignments. The kmeans
function stores the assignments in a ‘cluster’ attribute. The
pam
function uses a ‘clustering’ attribute. For
hclust
, the assignments are produced by a call to the
cutree
function.
It has been observed that the K-means algorithm can converge to
different solutions depending on the starting values of the group
centers. We also include a routine (repeatedKmeans
) that runs
the K-means algorithm repeatedly, using different randomly generated
staring points each time, saving the best results.
Each of the cut...
functions returns a vector of integer values
representing the cluster assignments found by the algorithm.
The repeatedKmeans
function returns a list x
with three
components. The component x$kmeans
is the result of the call
to the kmeans
function that produced the best fit to the
data. The component x$centers
is a matrix containing the list
of group centers that were used in the best call to kmeans
.
The component x$withinss
contains the sum of the within-group
sums of squares, which is used as the measure of fitness.
Kevin R. Coombes krc@silicovore.com
cutree
,
hclust
,
kmeans
,
pam
# simulate data from three different groups
d1 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
d2 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
d3 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE)
dd <- cbind(d1, d2, d3)
cutKmeans(dd, k=3)
cutKmeans(dd, k=4)
cutHclust(dd, k=3)
cutHclust(dd, k=4)
cutPam(dd, k=3)
cutPam(dd, k=4)
cutRepeatedKmeans(dd, k=3, nTimes=10)
cutRepeatedKmeans(dd, k=4, nTimes=10)
# cleanup
rm(d1, d2, d3, dd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.