BootstrapClusterTest | R Documentation |
Performs a nonparametric bootstrap (sampling with replacement) test to determine whether the clusters found by an unsupervised method appear to be robust in a given data set.
BootstrapClusterTest(data, FUN, subsetSize, nTimes=100, verbose=TRUE, ...)
data |
A data matrix, numerical data frame, or
|
FUN |
A |
... |
Additional arguments passed to the classifying function, |
subsetSize |
An optional integer argument. If present,
each iteration of the bootstrap selects |
nTimes |
The number of bootstrap samples to collect. |
verbose |
A logical flag |
Objects should be created using the BootstrapClusterTest
function, which performs the requested bootstrap on the
clusters. Following the standard R paradigm, the resulting object can be
summarized and plotted to determine the results of the test.
f
:A function
that, given a data matrix,
returns a vector of cluster assignments. Examples of functions
with this behavior are cutHclust
,
cutKmeans
, cutPam
, and
cutRepeatedKmeans
.
subsetSize
:The number of rows to be included in each bootstrap sample.
nTimes
:An integer, the number of bootstrap samples that were collected.
call
:An object of class call
, which records
how the object was produced.
result
:Object of class matrix
containing, for
each pair of columns in the original data, the number of times
they belonged to the same cluster of a bootstrap sample.
Class ClusterTest
, directly. See that class for
descriptions of the inherited methods image
and hist
.
signature(object = BootstrapClusterTest)
:
Write out a summary of the object.
Kevin R. Coombes krc@silicovore.com
Kerr MK, Churchill GJ.
Bootstrapping cluster analysis: Assessing the reliability of
conclusions from microarray experiments.
PNAS 2001; 98:8961-8965.
ClusterTest
,
PerturbationClusterTest
showClass("BootstrapClusterTest")
## simulate data from two different groups
d1 <- matrix(rnorm(100*30, rnorm(100, 0.5)), nrow=100, ncol=30, byrow=FALSE)
d2 <- matrix(rnorm(100*20, rnorm(100, 0.5)), nrow=100, ncol=20, byrow=FALSE)
dd <- cbind(d1, d2)
cols <- rep(c('red', 'green'), times=c(30,20))
colnames(dd) <- paste(cols, c(1:30, 1:20), sep='')
## peform your basic hierarchical clustering...
hc <- hclust(distanceMatrix(dd, 'pearson'), method='complete')
## bootstrap the clusters arising from hclust
bc <- BootstrapClusterTest(dd, cutHclust, nTimes=200, k=3, metric='pearson')
summary(bc)
## look at the distribution of agreement scores
hist(bc, breaks=101)
## let heatmap compute a new dendrogram from the agreement
image(bc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
## plot the agreement matrix with the original dendrogram
image(bc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
## bootstrap the results of PAM
pamc <- BootstrapClusterTest(dd, cutPam, nTimes=200, k=3)
image(pamc, dendrogram=hc, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
## contrast the behavior when all the data comes from the same group
xx <- matrix(rnorm(100*50, rnorm(100, 0.5)), nrow=100, ncol=50, byrow=FALSE)
hct <- hclust(distanceMatrix(xx, 'pearson'), method='complete')
bct <- BootstrapClusterTest(xx, cutHclust, nTimes=200, k=4, metric='pearson')
summary(bct)
image(bct, dendrogram=hct, col=blueyellow(64), RowSideColors=cols, ColSideColors=cols)
## cleanup
rm(d1, d2, dd, cols, hc, bc, pamc, xx, hct, bct)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.