makeConsensus: Find sets of samples that stay together across clusterings
In clusterExperiment: Compare Clusterings for Single-Cell Sequencing

Description Usage Arguments Details Value Examples

Find sets of samples that stay together across clusterings in order to define a new clustering vector.

## S4 method for signature 'matrix'
makeConsensus(
  x,
  proportion,
  clusterFunction = "hierarchical01",
  minSize = 5,
  propUnassigned = 0.5,
  whenUnassign = c("before", "after"),
  clusterArgs = NULL
)

## S4 method for signature 'ClusterExperiment'
makeConsensus(
  x,
  whichClusters,
  eraseOld = FALSE,
  clusterLabel = "makeConsensus",
  ...
)

`x`	a matrix with samples on the rows and different clusterings on the columns or `ClusterExperiment` object.
`proportion`	The proportion of times that two sets of samples should be together in order to be grouped into a cluster (if <1, passed to mainClustering via alpha = 1 - proportion)
`clusterFunction`	the clustering function to use (passed to `mainClustering`); currently must be of type '01' and accept as input matrices of type "cat" (see details of ?ClusterFunction).
`minSize`	minimum size required for a set of samples to be considered in a cluster because of shared clustering, passed to `mainClustering`
`propUnassigned`	samples with greater than this proportion of assignments equal to '-1' are assigned a '-1' cluster value as a last step (only if proportion < 1)
`whenUnassign`	(provided for back compatibility with previous versions). Must be one of "before" or "after", indicating at what point are samples with a proportion of assignments of -1 greater than `propUnassigned` forced to have a '-1' value. If "before", then these samples are removed and not used for clustering. If "after", these samples are included in the clustering step, but then the cluster values they receive are assigned a '-1. These choices may result in different clusterings, because if these samples are included in the clustering (i.e. `whenUnassign="after"`, then these samples may affect the cluster assignments of other samples. The default is currently "before", but previous to version 2.5.4, there was no such option and the code internally set to "after", so for reproducibility with older results, users may need to set this option.
`clusterArgs`	list of arguments to be passed to the call to `mainClustering` that is used to cluster the proportion overlap between samples.
`whichClusters`	argument that can be either numeric or character vector indicating the clusterings to be used. See details of `getClusterIndex`.
`eraseOld`	logical. Only relevant if input `x` is of class `ClusterExperiment`. If TRUE, will erase existing workflow results (clusterMany as well as mergeClusters and makeConsensus). If FALSE, existing workflow results will have "`_i`" added to the clusterTypes value, where `i` is one more than the largest such existing workflow clusterTypes.
`clusterLabel`	a string used to describe the type of clustering. By default it is equal to "makeConsensus", to indicate that this clustering is the result of a call to makeConsensus. However, a more informative label can be set (see vignette).
`...`	arguments to be passed on to the method for signature `matrix,missing`.

This function was previously called combineMany (versions <= 2.0.0). combineMany is still available, but is considered defunct and users should update their code accordingly.

The function tries to find a consensus cluster across many different clusterings of the same samples. It does so by creating a nSamples x nSamples matrix of the percentage of co-occurance of each sample and then calling mainClustering to cluster the co-occurance matrix. The function assumes that '-1' labels indicate clusters that are not assigned to a cluster. Co-occurance with the unassigned cluster is treated differently than other clusters. The percent co-occurance is taken only with respect to those clusterings where both samples were assigned. Then samples with more than propUnassigned values that are '-1' across all of the clusterings are assigned a '-1' regardless of their cluster assignment.

The method calls mainClustering on the proportion matrix with clusterFunction as the 01 clustering algorithm, alpha=1-proportion, minSize=minSize, and evalClusterMethod=c("average"). See help of mainClustering for more details.

If x is a matrix, a list with values

clustering vector of cluster assignments, with "-1" implying unassigned
percentageShared a nSample x nSample matrix of the percent co-occurance across clusters used to find the final clusters. Percentage is out of those not '-1'
noUnassignedCorrection a vector of cluster assignments before samples were converted to '-1' because had >propUnassigned '-1' values (i.e. the direct output of the mainClustering output.)

If x is a ClusterExperiment, a ClusterExperiment object, with an added clustering of clusterTypes equal to makeConsensus and the percentageShared matrix stored in the coClustering slot.

data(simData)

cl <- clusterMany(simData,nReducedDims=c(5,10,50),  reduceMethod="PCA",
clusterFunction="pam", ks=2:4, findBestK=c(FALSE), removeSil=TRUE,
makeMissingDiss=TRUE, subsample=FALSE)

#make names shorter for plotting
clMat <- clusterMatrix(cl)
colnames(clMat) <- gsub("TRUE", "T", colnames(clMat))
colnames(clMat) <- gsub("FALSE", "F", colnames(clMat))
colnames(clMat) <- gsub("k=NA,", "", colnames(clMat))

#require 100% agreement -- very strict
clCommon100 <- makeConsensus(clMat, proportion=1, minSize=10)

#require 70% agreement based on clustering of overlap
clCommon70 <- makeConsensus(clMat, proportion=0.7, minSize=10)

oldpar <- par(no.readonly = TRUE)
par(mar=c(1.1, 12.1, 1.1, 1.1))
plotClusters(cbind("70%Similarity"=clCommon70, clMat,
"100%Similarity"=clCommon100), axisLine=-2)

#method for ClusterExperiment object
clCommon <- makeConsensus(cl, whichClusters="workflow", proportion=0.7,
minSize=10)
plotClusters(clCommon)
par(oldpar)