Description Usage Arguments Value Author(s) References Examples
View source: R/consensus-cluster.R
This function allows to perform consensus clustering using the k-means clustering algorithm, for a fixed number of clusters. We consider the number of clusters K to be fixed.
1 2 3 4 5 6 7 8 9 10 11 | consensusCluster(
data = NULL,
K = 2,
B = 100,
pItem = 0.8,
clMethod = "hclust",
dist = "euclidean",
hclustMethod = "average",
sparseKmeansPenalty = NULL,
maxIterKM = 1000
)
|
data |
N X P data matrix |
K |
Number of clusters. |
B |
Number of iterations. |
pItem |
Proportion of items sampled at each iteration. |
clMethod |
Clustering algorithm. Can be "hclust" for hierarchical clustering, "kmeans" for k-means clustering, "pam" for partitioning around medoids, "sparse-kmeans" for sparse k-means clustering or "sparse-hclust" for sparse hierarchical clustering. Default is "hclust". However, if the data contain at least one covariate that is a factor, the default clustering algorithm is "pam". |
dist |
Distance used for hierarchical clustering. Can be "pearson" (for 1 - Pearson correlation), "spearman" (for 1- Spearman correlation), any of the distances provided in stats::dist() (i.e. "euclidean", "maximum", "manhattan", "canberra", "binary" or "minkowski"), or a matrix containing the distances between the observations. |
hclustMethod |
Hierarchical clustering method. Default is "average". For
more details see |
sparseKmeansPenalty |
If the selected clustering method is "sparse-kmeans", this is the value of the parameter "wbounds" of the "KMeansSparseCluster" function. The default value is the square root of the number of variables. |
maxIterKM |
Number of iterations for the k-means clustering algorithm. |
The output is a consensus matrix, that is a symmetric matrix where the element in position (i,j) corresponds to the proportion of times that items i and j have been clustered together.
Alessandra Cabassi alessandra.cabassi@mrc-bsu.cam.ac.uk
Monti, S., Tamayo, P., Mesirov, J. and Golub, T., 2003. Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine learning, 52(1-2), pp.91-118.
Witten, D.M. and Tibshirani, R., 2010. A framework for feature selection in clustering. Journal of the American Statistical Association, 105(490), pp.713-726.
1 2 3 4 5 6 | # Load one dataset with 300 observations, 2 variables, 6 clusters
data <- as.matrix(read.csv(system.file("extdata", "dataset1.csv",
package = "coca"), row.names = 1))
# Compute consensus clustering with K=5 clusters
cm <- consensusCluster(data, K = 5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.