Description Details Author(s) Examples
This package allows for identification of optimal clustering for a data set. It provides a framework to run a wide range of clustering algorithms to determine the optimal number (k) of clusters in the data. It then provides a function to analyze the cluster assignments from each clustering algorithm to identify samples that repeatedly classify to the same group. We call these 'core clusters,' leading to optimal beds for later class discovery.
Package: | COMMUNAL |
Type: | Package |
Version: | 1.1 |
Date: | 2015-08-12 |
License: | GPL-2 |
Imports: | clValid, fpc, methods |
Depends: | R (>= 2.10), cluster |
Suggests: | RUnit, NMF, ConsensusClusterPlus, rgl |
Start with a matrix of data to cluster. Important functions are:
COMMUNAL
to run clustering algorithms once
clusterRange
to run COMMUNAL across increasing subsets of data
getGoodAlgs
to identify robust algorithms from clusterRange
getNonCorrNonMonoMeasures
to identify non-monotonic, non-correlated validity measures from clusterRange
plotRange3D
to pick k
clusterKeys
to identify core clusters
returnCore
to identify core clusters
Albert Chen, Timothy E Sweeney, Olivier Gevaert
Maintainer: Albert Chen acc2015@stanford.edu
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Not run:
## create artificial data set with 3 distinct clusters
set.seed(1)
V1 = c(abs(rnorm(100, 2)), abs(rnorm(100, 50)), abs(rnorm(100, 140)))
V2 = c(abs(rnorm(100, 2, 8)), abs(rnorm(100, 55, 4)), abs(rnorm(100, 105, 1)))
data <- t(data.frame(V1, V2))
colnames(data) <- paste("Sample", 1:ncol(data), sep="")
rownames(data) <- paste("Gene", 1:nrow(data), sep="")
## run COMMUNAL
result <- COMMUNAL(data=data, ks=seq(2,5)) # result is a COMMUNAL object
k <- 3 # suppose optimal cluster number is 3
clusters <- result$getClustering(k) # method to extract clusters
mat.key <- clusterKeys(clusters) # get core clusters
examineCounts(mat.key) # help decide agreement.thresh
core <- returnCore(mat.key, agreement.thresh=50) # find 'core' clusters (all algs agree)
table(core) # the 'core' clusters
## Additional arguments are passed down to clValid, NMF, ConsensusClusterPlus
result <- COMMUNAL(data=data, ks=2:5,
clus.methods=c("diana", "ccp-hc", "nmf"), reps=20, nruns=2)
## To identify k, use clusterRange and plotRange3D to visualize validation measures
data(BRCA.100) # 533 tissues to cluster, with measurements of 100 genes each
varRange <- c(10,25,50,75,100)
meas <- c("Connectivity", "average.between",
"ch", "sindex", "avg.silwidth",
"average.within", "dunn", "widestgap",
"wb.ratio", "entropy", "dunn2",
"pearsongamma", "g3", "within.cluster.ss",
"min.separation", "max.diameter")
BRCA.results <- clusterRange(BRCA.100, ks=2:6, varRange=varRange, validation=meas)
goodMeasures <- getNonCorrNonMonoMeasures(BRCA.results)
goodAlgs <- getGoodAlgs(BRCA.results)
plot.data <- plotRange3D(BRCA.results, goodAlgs=goodAlgs, goodMeasures = goodMeasures)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.