Description Usage Arguments Value Author(s) See Also Examples
This functions runs various (user-specified) clustering algorithms on the data, for each potential number of clusters k
. It then runs internal validation measures the quantify the fit of each clustering. The returned object is of class "COMMUNAL"
, and can be used to identify 'core' clusters in the data. Currently supported clustering algorithms are those in packages "clValid", "NMF", and "ConsensusClusterPlus".
The COMMUNAL algorithm is designed to be run with clusterRange
, via a call to COMMUNAL() (although this may still be useful to some researchers). After running clusterRange
, use getGoodAlgs
and getNonCorrNonMonoMeasures
to get locally optimized clustering algorithms and validity measures.
To determine the optimal number of clusters, use the plotRange3D
function.
1 2 3 4 5 6 7 | COMMUNAL(data, ks, clus.methods = c("hierarchical", "kmeans", "diana",
"som", "sota", "pam", "clara", "agnes"),
validation = c("Connectivity", "dunn", "wb.ratio", "g3",
"g2", "pearsongamma", "avg.silwidth", "sindex"),
dist.metric = "euclidean", aggl.method = "ward",
neighb.size = 10, seed = NULL, parallel=F, gapBoot=20,
verbose=F, mc.cores=NULL, ...)
|
data |
The data to cluster (numeric matrix or data frame). The columns are clustered, rows are features. If using cluster method |
ks |
A numeric vector of integers greater than 1, for the number of clusters to consider. For example, 2:4 tells the function to try clusterings with 2, 3, and 4 clusters. |
clus.methods |
Character vector of which clustering methods to use. Valid options: " |
validation |
A character vector of the validation measures to consider. Valid options: " |
dist.metric |
Which metric to use when calculating the distance matrix. Used by clValid clustering algorithms, and in calculating validation measures. Available choices are " |
aggl.method |
The agglomeration method to use for " |
neighb.size |
Numeric value. The neighborhood size used for calculating the |
seed |
Numeric value. Random seed to use in ConsensusClusterPlus and NMF. |
parallel |
Allows for parallel computation of the gap statistic bootstraps. WILL NOT WORK ON WINDOWS MACHINES (sorry). |
gapBoot |
The number of gap statistic bootstraps to perform. This recursively calls COMMUNAL for each bootstrap, though the other validation measures do not have to be calculated for each call. |
verbose |
Mostly output regarding clustering algorithms and gap statistic. |
mc.cores |
If null, uses detectCores(). Ignored if parallel=F. |
... |
Other arguments to pass down to ConsensusClusterPlus, NMF, and clValid. |
Return object is an object of class COMMUNAL
. The class has a getClustering
method to extract a data frame of cluster assignments. Alternatively, functions clusterKeys
and returnCore
are provided to identify core clusters. See examples below.
Albert Chen and Timothy E Sweeney
Maintainer: Albert Chen acc2015@stanford.edu
Class "COMMUNAL"
. Use functions clusterKeys
and returnCore
to identify core clusters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ## Not run:
## create artificial data set with 3 distinct clusters
set.seed(1)
V1 = c(abs(rnorm(100, 2)), abs(rnorm(100, 50)), abs(rnorm(100, 140)))
V2 = c(abs(rnorm(100, 2, 8)), abs(rnorm(100, 55, 4)), abs(rnorm(100, 105, 1)))
data <- t(data.frame(V1, V2))
colnames(data) <- paste("Sample", 1:ncol(data), sep="")
rownames(data) <- paste("Gene", 1:nrow(data), sep="")
## run COMMUNAL
result <- COMMUNAL(data=data, ks=seq(2,5)) # result is a COMMUNAL object
k <- 3 # suppose optimal cluster number is 3
clusters <- result$getClustering(k) # method to extract clusters
mat.key <- clusterKeys(clusters) # get core clusters
examineCounts(mat.key) # help decide agreement.thresh
core <- returnCore(mat.key, agreement.thresh=50) # find 'core' clusters (all algs agree)
table(core) # the 'core' cluster sizes
## Note: could try a different value for k to
## see clusters with sub-optimal k
## Can specify clustering methods and validation measures
result <- COMMUNAL(data = data, ks=c(2,3),
clus.methods = c("diana", "som", "pam", "kmeans", "ccp-hc", "nmf"),
validation=c('pearsongamma', 'avg.silwidth'))
clusters <- result$getClustering(k=3)
mat.key <- clusterKeys(clusters)
examineCounts(mat.key)
core <- returnCore(mat.key, agreement.thresh=50) # find 'core' clusters
table(core) # the 'core' clusters
## Additional arguments are passed down to clValid, NMF, ConsensusClusterPlus
result <- COMMUNAL(data=data, ks=2:5,
clus.methods=c("diana", "ccp-hc", "nmf"), reps=20, nruns=2)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.