hybridHclust: Hybrid hierarchical clustering using mutual clusters.

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Top-down clustering (tsvq) is applied to data with constraint that mutual clusters cannot be divided. Within each mutual cluster, tsvq is re-applied to yeild a top-down hybrid in which mutual cluster structure is retained.

Usage

1

Arguments

x

A data matrix whose rows are to be clustered

themc

An object representing the mutual clusters in x, typically generated by mutualCluster. If it is not provided, it will be calculated.

trace

Should internal steps be printed as they execute?

Details

A mutual cluster is a set of points that should never be broken (see help for ‘mutualCluster’ for a more precise definition). hybridHcclust uses this idea to construct a top-down clustering in which mutual clusters are never broken. This is achieved by temporarily “fusing” together all points in a mutual cluster so that they have equal coordinates, running tsvq, and then re-running tsvq within each mutual cluster. The resultant top-down clusterings are then “stitched” together to form a single top-down clustering.

Only maximal mutual clusters are constrained to not be broken. Thus if points A, B, C, D are a mutual cluster and points A, B are also a mutual cluster, only the four points will be forbidden from being broken.

Because hybridHclust uses tsvq to build the hierarchical clusterings, it is implicitly using squared Euclidean distance between rows of x. In some instances (especially for microarray data), a desirable distance measure is d(x1,x2)=1-cor(x1,x2), if x1 and x2 are 2 rows of the matrix x. This correlation-based distance is equivalent to squared Euclidean distance once rows have been scaled to have mean 0 and standard deviation 1. This can be accomplished by pre-processing x before calling hybridHclust. An example is provided below.

Value

A dendrogram in hclust format.

Author(s)

Hugh Chipman

References

Chipman, H. and Tibshirani, R. (2006) "Hybrid Hierarchical Clustering with Applications to Microarray Data", Biostatistics, 7, 302-317.

See Also

tsvq, “hopach” package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
x <- cbind(c(-1.4806,1.5772,-0.9567,-0.92,-1.9976,-0.2723,-0.3153),
c( -0.6283,-0.1065,0.428,-0.7777,-1.2939,-0.7796,0.012))
hyb1 <- hybridHclust(x)
par(mfrow=c(1,2))
plot(x, pch = as.character(1:nrow(x)), asp = 1)
plot(hyb1)

# also works 
mc1 <- mutualCluster(x)
mc1
# (3,7) and (1,4) are the two mutual clusters
hyb1 <- hybridHclust(x,mc1)

print('example on sorlie data, may take up to a minute to run')
data(sorlie)
x.scaled <- t(sorlie)
# We take the transpose of "sorlie" because we want to cluster tissue
# samples.  Tissue samples are columns of "sorlie" and hybridHclust will
# cluster rows.

for (i in 1:nrow(x.scaled))
  x.scaled[i,] <- (sorlie[,i]-mean(sorlie[,i]))/sd(sorlie[,i])
# Scale the rows of x.scaled matrix.  This will mean that squared Euclidean
# distance used by hybridHclust will be equivalent to correlation distance.

hhc1 <- hybridHclust(x.scaled,trace=TRUE)
plot(hhc1,labels=dimnames(x.scaled)[[1]])

print('\n\n run demo(hybridHclust) for a more complete package demonstration')

Example output

1 : 3 7 
2 : 1 4 
[1] "example on sorlie data, may take up to a minute to run"
finding mutual clusters
temporarily fusing together data belonging to mutual clusters
running tsvq with MCs constrained to remain together
234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465redoing tsvq within each mutual cluster
23[1] "\n\n run demo(hybridHclust) for a more complete package demonstration"

hybridHclust documentation built on May 2, 2019, 7:33 a.m.