Description Usage Arguments Details Value Author(s) References See Also Examples
Top-down clustering (tsvq
) is applied to data with constraint that
mutual clusters cannot be divided. Within each mutual cluster, tsvq is re-applied to
yeild a top-down hybrid in which mutual cluster structure is retained.
1 | hybridHclust(x, themc=NULL, trace=FALSE)
|
x |
A data matrix whose rows are to be clustered |
themc |
An object representing the mutual clusters in x, typically
generated by |
trace |
Should internal steps be printed as they execute? |
A mutual cluster is a set of points that should never be broken (see help
for ‘mutualCluster’ for a more precise definition). hybridHcclust
uses this idea to
construct a top-down clustering in which mutual clusters are never broken.
This is achieved by temporarily “fusing” together all points in a
mutual cluster
so that they have equal coordinates, running tsvq
, and then
re-running tsvq
within each mutual cluster. The resultant top-down clusterings
are then “stitched” together to form a single top-down clustering.
Only maximal mutual clusters are constrained to not be broken. Thus if points A, B, C, D are a mutual cluster and points A, B are also a mutual cluster, only the four points will be forbidden from being broken.
Because hybridHclust
uses tsvq
to build the hierarchical
clusterings, it is implicitly using squared Euclidean distance between rows
of x. In some instances (especially for microarray data), a desirable
distance measure is d(x1,x2)=1-cor(x1,x2), if x1 and x2 are 2 rows of the
matrix x. This correlation-based distance is equivalent to squared
Euclidean distance once rows have been scaled to have mean 0 and standard
deviation 1. This can be accomplished by pre-processing x before calling
hybridHclust
. An example is provided below.
A dendrogram in hclust
format.
Hugh Chipman
Chipman, H. and Tibshirani, R. (2006) "Hybrid Hierarchical Clustering with Applications to Microarray Data", Biostatistics, 7, 302-317.
tsvq, “hopach” package
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | x <- cbind(c(-1.4806,1.5772,-0.9567,-0.92,-1.9976,-0.2723,-0.3153),
c( -0.6283,-0.1065,0.428,-0.7777,-1.2939,-0.7796,0.012))
hyb1 <- hybridHclust(x)
par(mfrow=c(1,2))
plot(x, pch = as.character(1:nrow(x)), asp = 1)
plot(hyb1)
# also works
mc1 <- mutualCluster(x)
mc1
# (3,7) and (1,4) are the two mutual clusters
hyb1 <- hybridHclust(x,mc1)
print('example on sorlie data, may take up to a minute to run')
data(sorlie)
x.scaled <- t(sorlie)
# We take the transpose of "sorlie" because we want to cluster tissue
# samples. Tissue samples are columns of "sorlie" and hybridHclust will
# cluster rows.
for (i in 1:nrow(x.scaled))
x.scaled[i,] <- (sorlie[,i]-mean(sorlie[,i]))/sd(sorlie[,i])
# Scale the rows of x.scaled matrix. This will mean that squared Euclidean
# distance used by hybridHclust will be equivalent to correlation distance.
hhc1 <- hybridHclust(x.scaled,trace=TRUE)
plot(hhc1,labels=dimnames(x.scaled)[[1]])
print('\n\n run demo(hybridHclust) for a more complete package demonstration')
|
1 : 3 7
2 : 1 4
[1] "example on sorlie data, may take up to a minute to run"
finding mutual clusters
temporarily fusing together data belonging to mutual clusters
running tsvq with MCs constrained to remain together
234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465redoing tsvq within each mutual cluster
23[1] "\n\n run demo(hybridHclust) for a more complete package demonstration"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.