M3C: Monte Carlo Consensus Clustering

Genome-wide data is used to stratify patients into classes using class discovery algorithms. However, we have observed systematic bias present in current state-of-the-art methods. This arises from not considering reference distributions while selecting the number of classes (K). As a solution, we developed a consensus clustering-based algorithm with a hypothesis testing framework called Monte Carlo consensus clustering (M3C). M3C uses a multi-core enabled Monte Carlo simulation to generate null distributions along the range of K which are used to calculate p values to select its value. P values beyond the limits of the simulation are estimated using a beta distribution. M3C can quantify structural relationships between clusters and uses spectral clustering to deal with non-gaussian and imbalanced structures.


-M3C calculates the consensus rate, a measure of stability of samples, which is quantified for each K using the PAC score -Generation of reference PAC distribution using a multi-core Monte Carlo simulation -Reference generation preserves gene-gene correlation structure of data -The relative cluster stability index (RCSI) and empirical p values are used instead of delta K -Extrapolated p values are calculate by fitting a beta distribution -Increased accuracy compared with other methods verified using simulations -Controls for the null hypothesis K = 1 -Removes systematic bias -Ability to investigates structural relationships using hierarchical clustering of medoids and sigclust -Inner algorithms are PAM, K means, and spectral clustering -Automatic re ordering of expression matrix and annotation data to help user do their analysis faster -Plotting code using ggplot2 for publication quality outputs

crj32/M3C documentation built on July 10, 2018, 8:03 p.m.