Unsupervised clustering algorithms, such as partitioning around medoids
pam), K-means (
hierarchical clustering (
hclust) after cutting the tree,
produce a list of class assignments along with other structure. To
simplify the interface for the
PerturbationClusterTest, we have written these routines
that simply extract these cluster assignments.
1 2 3 4 5 6
A numerical data matrix
The number of classes desired from the algorithm
Any valid linkage method that can be passed to the
Any valid distance metric that can be passed to the
An integer; the number of times to repeat the K-means algorithm with a different random starting point
Each of the clustering routines used here has a different
structure for storing cluster assignments. The
function stores the assignments in a ‘cluster’ attribute. The
pam function uses a ‘clustering’ attribute. For
hclust, the assignments are produced by a call to the
It has been observed that the K-means algorithm can converge to
different solutions depending on the starting values of the group
centers. We also include a routine (
repeatedKmeans) that runs
the K-means algorithm repeatedly, using different randomly generated
staring points each time, saving the best results.
Each of the
cut... functions returns a vector of integer values
representing the cluster assignments found by the algorithm.
repeatedKmeans function returns a list
x with three
components. The component
x$kmeans is the result of the call
kmeans function that produced the best fit to the
data. The component
x$centers is a matrix containing the list
of group centers that were used in the best call to
x$withinss contains the sum of the within-group
sums of squares, which is used as the measure of fitness.
Kevin R. Coombes [email protected]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# simulate data from three different groups d1 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) d2 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) d3 <- matrix(rnorm(100*10, rnorm(100, 0.5)), nrow=100, ncol=10, byrow=FALSE) dd <- cbind(d1, d2, d3) cutKmeans(dd, k=3) cutKmeans(dd, k=4) cutHclust(dd, k=3) cutHclust(dd, k=4) cutPam(dd, k=3) cutPam(dd, k=4) cutRepeatedKmeans(dd, k=3, nTimes=10) cutRepeatedKmeans(dd, k=4, nTimes=10) # cleanup rm(d1, d2, d3, dd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.