M3C: M3C: Monte Carlo Reference-based Consensus Clustering
In crj32/M3C: Monte Carlo Reference-based Consensus Clustering

Description Usage Arguments Value Examples

View source: R/M3C.R

This is the M3C core function, which is a reference-based consensus clustering algorithm. The basic idea is to use a multi-core enabled Monte Carlo simulation to drive the creation of a null distribution of stability scores. The Monte Carlo simulations maintains the feature correlation structure of the input data. Then the null distribution is used to compare the reference scores with the real scores and an empirical p value is calculated for every value of K to test the null hypothesis K=1. We derive the Relative Cluster Stability Index (RCSI) as a metric for selecting K, which is based on a comparison against the reference mean. A fast alternative is also included that includes a penalty term to prevent overestimation of K, we call regularised consensus clustering.

M3C(mydata, cores = 1, iters = 25, maxK = 10, pItem = 0.8,
  des = NULL, ref_method = c("reverse-pca", "chol"), repsref = 100,
  repsreal = 100, clusteralg = c("pam", "km", "spectral", "hc"),
  pacx1 = 0.1, pacx2 = 0.9, seed = 123, objective = "entropy",
  removeplots = FALSE, silent = FALSE, fsize = 18, method = 1,
  lambdadefault = 0.1, tunelambda = TRUE, lseq = seq(0.02, 0.1, by =
  0.02), lthick = 2, dotsize = 3)

`mydata`	Data frame or matrix: Contains the data, with samples as columns and rows as features
`cores`	Numerical value: how many cores to split the monte carlo simulation over
`iters`	Numerical value: how many Monte Carlo iterations to perform (default: 25, recommended: 5-100)
`maxK`	Numerical value: the maximum number of clusters to test for, K (default: 10)
`pItem`	Numerical value: the fraction of points to resample each iteration (default: 0.8)
`des`	Data frame: contains annotation data for the input data for automatic reordering
`ref_method`	Character string: refers to which reference method to use
`repsref`	Numerical value: how many resampling reps to use for reference (default: 100, recommended: 100-250)
`repsreal`	Numerical value: how many resampling reps to use for real data (default: 100, recommended: 100-250)
`clusteralg`	String: dictates which inner clustering algorithm to use (default: PAM)
`pacx1`	Numerical value: The 1st x co-ordinate for calculating the pac score from the CDF (default: 0.1)
`pacx2`	Numerical value: The 2nd x co-ordinate for calculating the pac score from the CDF (default: 0.9)
`seed`	Numerical value: specifies seed, set to NULL for different results each time
`objective`	Character string: whether to use 'PAC' or 'entropy' objective function (default = entropy)
`removeplots`	Logical flag: whether to remove all plots from view
`silent`	Logical flag: whether to remove messages or not
`fsize`	Numerical value: determines the font size of the ggplot2 plots
`method`	Numerical value: 1 refers to the Monte Carlo simulation method, 2 to regularised consensus clustering
`lambdadefault`	Numerical value: if not tuning fixes the default (default: 0.1)
`tunelambda`	Logical flag: whether to tune lambda or not
`lseq`	Numerical vector: vector of lambda values to tune over (default = seq(0.05,0.1,by=0.01))
`lthick`	Numerical value: determines the line thickness of the ggplot2 plot
`dotsize`	Numerical value: determines the dotsize of the ggplot2 plot

A list, containing: 1) the stability results and 2) all the output data (another list) 3) reference stability scores (see vignette for more details on how to easily access)