consensus: Consensus Cluster Algorithm
In dswatson/M3C: Rigorously test cluster stability

Description Usage Arguments Details Value References See Also Examples

View source: R/consensus.R

This function implements the consensus cluster algorithm.

consensus(dat, max_k = 3, reps = 100, distance = "euclidean",
  cluster_alg = "hclust", hclust_method = "average", p_item = 0.8,
  p_feature = 1, wts_item = NULL, wts_feature = NULL, seed = NULL,
  parallel = TRUE, check = TRUE)

`dat`	Probe by sample omic data matrix. Data should be filtered and normalized prior to analysis.
`max_k`	Integer specifying the maximum cluster number to evaluate. Default is `max_k = 3`, but a more reasonable rule of thumb is the square root of the sample size.
`reps`	Number of subsamples to draw.
`distance`	Distance metric for clustering. Supports all methods available in `dist` and `vegdist`, as well as those implemented in the `bioDist` package.
`cluster_alg`	Clustering algorithm to implement. Currently supports hierarchical (`"hclust"`), k-means (`"kmeans"`), and k-medoids (`"pam"`).
`hclust_method`	Method to use if `cluster_alg = "hclust"`. See `hclust`.
`p_item`	Proportion of items to include in each subsample.
`p_feature`	Proportion of features to include in each subsample.
`wts_item`	Optional vector of item weights.
`wts_feature`	Optional vector of feature weights.
`seed`	Optional seed for reproducibility.
`parallel`	If a parallel backend is loaded and available, should the function use it? Highly advisable if hardware permits.
`check`	Check for errors in function arguments? This is set to `FALSE` by internal `M3C` functions to cut down on redundant checks, but should generally be `TRUE` when used interactively.

Consensus clustering is a resampling procedure to evaluate cluster stability. A user-specified proportion of samples are held out on each run of the algorithm to test how often the remaining samples do or do not cluster together. The result is a square, symmetric consensus matrix for each value of cluster numbers k. Each cell of the matrix mat[i, j] represents the proportion of all runs including samples i and j in which the two were clustered together.

A list with max_k elements, the first of which is NULL. Elements two through max_k are consensus matrices corresponding to cluster numbers k = 2 through max_k.

Monti, S., Tamayo, P., Mesirov, J., & Golub, T. (2003). Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52: 91-118.

ConsensusClusterPlus, M3C