README.md

ConsensusClustering

Consensus Clustering is a revised tool for implementing the methodology for class discovery and clustering validation, based off of Monti et al, 2003 paper, Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data. This method is used to find a consensus assignment across multiple runs of a clustering approach, allowing one to assess and validate the stability of the discovered clusters empirically. The objective of this method is to identify robust clusters in the context of genomic data, but is applicable for any unsupervised learning task.

Description of Consensus Clustering

The development of Consensus Clustering is driven by the need to determine the number of cluster as well as how likely the membership of the clusters repeatably agreed within a dataset. This application is aimed towards genomic data within cancer genomics where subclasses of disease are clinically relevant for treatment but can easily be applied elsewhere. This method includes taking random subsamples of data points, applying a clustering algorithm to cluster the samples into k groups and then calculating some consensus among all the iterations to produce a final and robust cluster assignment. This is done for different values of k, and the Consensus Clustering methodology also provides tools for assesing the optimal number of clusters.

More specifically, Consensus Clustering was inspired by ConsensusClusterPlus (Wilkerson 2010), with major design and application renovation to the implementation which includes but not limited to:

See more in the vignettes.



mpru/ConsensusClustering documentation built on May 9, 2019, 5:54 a.m.