consensus: Consensus Cluster Algorithm

Description Usage Arguments Details Value References See Also Examples

Description

This function implements the consensus cluster algorithm.

Usage

1
2
3
4
consensus(dat, max_k = 3, reps = 100, distance = "euclidean",
  cluster_alg = "hclust", hclust_method = "average", p_item = 0.8,
  p_feature = 1, wts_item = NULL, wts_feature = NULL, seed = NULL,
  parallel = TRUE, check = TRUE)

Arguments

dat

Probe by sample omic data matrix. Data should be filtered and normalized prior to analysis.

max_k

Integer specifying the maximum cluster number to evaluate. Default is max_k = 3, but a more reasonable rule of thumb is the square root of the sample size.

reps

Number of subsamples to draw.

distance

Distance metric for clustering. Supports all methods available in dist and vegdist, as well as those implemented in the bioDist package.

cluster_alg

Clustering algorithm to implement. Currently supports hierarchical ("hclust"), k-means ("kmeans"), and k-medoids ("pam").

hclust_method

Method to use if cluster_alg = "hclust". See hclust.

p_item

Proportion of items to include in each subsample.

p_feature

Proportion of features to include in each subsample.

wts_item

Optional vector of item weights.

wts_feature

Optional vector of feature weights.

seed

Optional seed for reproducibility.

parallel

If a parallel backend is loaded and available, should the function use it? Highly advisable if hardware permits.

check

Check for errors in function arguments? This is set to FALSE by internal M3C functions to cut down on redundant checks, but should generally be TRUE when used interactively.

Details

Consensus clustering is a resampling procedure to evaluate cluster stability. A user-specified proportion of samples are held out on each run of the algorithm to test how often the remaining samples do or do not cluster together. The result is a square, symmetric consensus matrix for each value of cluster numbers k. Each cell of the matrix mat[i, j] represents the proportion of all runs including samples i and j in which the two were clustered together.

Value

A list with max_k elements, the first of which is NULL. Elements two through max_k are consensus matrices corresponding to cluster numbers k = 2 through max_k.

References

Monti, S., Tamayo, P., Mesirov, J., & Golub, T. (2003). Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. Machine Learning, 52: 91-118.

See Also

ConsensusClusterPlus, M3C

Examples

1
2
mat <- matrix(rnorm(1000 * 12), nrow = 1000, ncol = 12)
cc <- consensus(mat, max_k = 4)

dswatson/cc_testr documentation built on May 23, 2019, 7:34 a.m.