consensusClustering: The consensusClustering function

Description Usage Arguments Details Value Examples

Description

Runs Consensus Clustering for class discovery and clustering validation.

Usage

1
2
3
4
5
6
7
8
consensusClustering(dataMatrix, K = 2:3, nIters = 30, propSamples = 0.8,
  clusterAlgorithm = "Kmeans", verbose = TRUE, seed = NULL,
  saveResults = FALSE, pathOutput = "", finalLinkage = "average",
  PACLowerLim = 0.1, PACUpperLim = 0.9, plotHeatmaps = c("both",
  "consensus", "data", "no"), plotSave = c("no", "pdf", "bmp", "png", "ps"),
  showDendrogram = TRUE, showSamplesNames = TRUE,
  showFeaturesNames = TRUE, plotCDF = TRUE, plotTracking = TRUE,
  consensusStats = TRUE, consensusStatsPlots = TRUE)

Arguments

dataMatrix

matrix or data frame with data to cluster, samples/items in the columns and features in the rows.

K

vector of integers representing numbeer of clusters to evaluate. It can be of length 1 and it does not need to consist of consecutive integers. For example, either of K = 4, K = 2:5 or K = c(5, 10, 15) would work.

nIters

number of iterations (bootstrap samples).

propSamples

proportion of items to sample in each bootstrap sample.

clusterAlgorithm

algorithm to perform the clustering, for the moment only K-means is available.

verbose

logical, print progress messages to screen. During the bootstrap iterations, a report to monitor the progress is created in pathOutput.

seed

numerical value to set random seed for reproducible results. It uses doRNG package to guarantee reproducible results even when running in parallel.

saveResults

logical indicating if the output should be saved as an .rds file in the directory pathOutput.

pathOutput

directory for output files and iterations progress report, the current working directory by default.

finalLinkage

heirarchical linkage method for producing a final classification with the consensus indexes generated by the bootstrap samples.

PACLowerLim

lower limit for the interval of ambiguous clustering used for calculating PAC score, belongs to the interval (0, 1).

PACUpperLim

upper limit for the interval of ambiguous clustering used for calculating PAC score, belongs to the interval (0, 1).

plotHeatmaps

character string indicating which heatmaps should be produced: "consensus" (only heatmap of the consensus indexes), "data" (only heatmap of input data set), "both" (default), or "no" (no plot is produced).

plotSave

character string indicating the format the plot to be saved as files in directory pathOutput. Default is "no", the plots are not saved, but printed to the screen. Other options are: "pdf", "bmp", "png", "ps".

showDendrogram

logical indicating if dendrograms should be plotted in the heatmaps (defaults to TRUE).

showSamplesNames

logical indicating if sample names should be displayed in the plots (defaults to TRUE).

showFeaturesNames

logical indicating if features names should be displayed in the plots (defaults to TRUE).

plotCDF

logical indicating if the plot for the Cumulative Distribution Function (CDF) of the consensus indexes and for the relative change under the CDF should be produced. The second plot is not produced if length(K) == 1, since there is no comparison to be made. If plotCDF == TRUE, a vector with the area under the CDF curve for each K is returned.

plotTracking

logical indicating if the plot with the tracking of samples through different values of K should be produced. No tracking plot is produced if length(K) == 1, since there is no tracking to be done.

consensusStats

logical indicating if consensus statistics should be computed.

consensusStatsPlots

logical indicating if plots of consensus statistics should be produced (only considered if consensusStats == TRUE).

Details

Consensus Clustering is a revised tool for implementing the methodology for class discovery and clustering validation, based off of 2003 Monti's paper. This method is used to find a consensus assignment across multiple runs of a clustering approach, allowing one to assess and validate the stability of the discovered clusters empirically. The objective of this method is to identify robust clusters in the context of genomic data, but is applicable for any unsupervised learning task.

This function is parallelizad under the unifying paradigm, so it will automatically detect clusters or cores registered by the user before hand or will run sequentially if no parallel capabilities are available. Reproducible results are guaranteed when running in parallel if a seed is provided, through the use of the doRNG package.

Value

A list with the results of the consensus clustering. The first elements of the list correspond to each value of K evaluated, each one containing:

The following elements of the list returned are:

The list returned may also include the following elements if the correspondent arguments were set to TRUE:

Examples

1
2
mat <- matrix(rnorm(10*6), 10, 6)
result <- consensusClustering(mat, nIters = 5)

mpru/ConsensusClustering documentation built on May 9, 2019, 5:54 a.m.