Find sets of samples that stay together across clusterings

Description

Find sets of samples that stay together across clusterings in order to define a new clustering vector.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## S4 method for signature 'matrix,missing'
combineMany(x, whichClusters, proportion = 1,
  clusterFunction = "hierarchical01", propUnassigned = 0.5, minSize = 5)

## S4 method for signature 'ClusterExperiment,numeric'
combineMany(x, whichClusters,
  eraseOld = FALSE, clusterLabel = "combineMany", ...)

## S4 method for signature 'ClusterExperiment,character'
combineMany(x, whichClusters, ...)

## S4 method for signature 'ClusterExperiment,missing'
combineMany(x, whichClusters, ...)

Arguments

x

a matrix or clusterExperiment object.

whichClusters

a numeric or character vector that specifies which clusters to compare (missing if x is a matrix)

proportion

The proportion of times that two sets of samples should be together in order to be grouped into a cluster (if <1, passed to clusterD via alpha = 1 - proportion)

clusterFunction

the clustering to use (passed to clusterD); currently must be of type '01'.

propUnassigned

samples with greater than this proportion of assignments equal to '-1' are assigned a '-1' cluster value as a last step (only if proportion < 1)

minSize

minimum size required for a set of samples to be considered in a cluster because of shared clustering, passed to clusterD

eraseOld

logical. Only relevant if input x is of class ClusterExperiment. If TRUE, will erase existing workflow results (clusterMany as well as mergeClusters and combineMany). If FALSE, existing workflow results will have "_i" added to the clusterTypes value, where i is one more than the largest such existing workflow clusterTypes.

clusterLabel

a string used to describe the type of clustering. By default it is equal to "combineMany", to indicate that this clustering is the result of a call to combineMany. However, a more informative label can be set (see vignette).

...

arguments to be passed on to the method for signature matrix,missing.

Details

The function tries to find a consensus cluster across many different clusterings of the same samples. It does so by creating a nSamples x nSamples matrix of the percentage of co-occurance of each sample and then calling clusterD to cluster the co-occurance matrix. The function assumes that '-1' labels indicate clusters that are not assigned to a cluster. Co-occurance with the unassigned cluster is treated differently than other clusters. The percent co-occurance is taken only with respect to those clusterings where both samples were assigned. Then samples with more than propUnassigned values that are '-1' across all of the clusterings are assigned a '-1' regardless of their cluster assignment.

The method calls clusterD on the proportion matrix with clusterFunction as the 01 clustering algorithm, alpha=1-proportion, minSize=minSize, and evalClusterMethod=c("average"). See help of clusterD for more details.

Value

If x is a matrix, a list with values

  • clustering vector of cluster assignments, with "-1" implying unassigned

  • percentageShared a nSample x nSample matrix of the percent co-occurance across clusters used to find the final clusters. Percentage is out of those not '-1'

  • noUnassignedCorrection a vector of cluster assignments before samples were converted to '-1' because had >propUnassigned '-1' values (i.e. the direct output of the clusterD output.)

If x is a ClusterExperiment, a ClusterExperiment object, with an added clustering of clusterTypes equal to combineMany and the percentageShared matrix stored in the coClustering slot.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
data(simData)

cl <- clusterMany(simData,nPCADims=c(5,10,50),  dimReduce="PCA",
clusterFunction="pam", ks=2:4, findBestK=c(FALSE), removeSil=TRUE,
subsample=FALSE)

#make names shorter for plotting
clMat <- clusterMatrix(cl)
colnames(clMat) <- gsub("TRUE", "T", colnames(clMat))
colnames(clMat) <- gsub("FALSE", "F", colnames(clMat))
colnames(clMat) <- gsub("k=NA,", "", colnames(clMat))

#require 100% agreement -- very strict
clCommon100 <- combineMany(clMat, proportion=1, minSize=10)

#require 70% agreement based on clustering of overlap
clCommon70 <- combineMany(clMat, proportion=0.7, minSize=10)

oldpar <- par()
par(mar=c(1.1, 12.1, 1.1, 1.1))
plotClusters(cbind("70%Similarity"=clCommon70$clustering, clMat,
"100%Similarity"=clCommon100$clustering), axisLine=-2)

#method for ClusterExperiment object
clCommon <- combineMany(cl, whichClusters="workflow", proportion=0.7,
minSize=10)
plotClusters(clCommon)
par(oldpar)