Resampling-based Sequential Ensemble Clustering

Share:

Description

Implementation of the RSEC algorithm (Resampling-based Sequential Ensemble Clustering) for single cell sequencing data. This is a wrapper function around the existing clusterExperiment workflow that results in the output of RSEC.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## S4 method for signature 'matrix'
RSEC(x, isCount = FALSE, transFun = NULL,
  dimReduce = "PCA", nVarDims = NA, nPCADims = c(50), k0s = 4:15,
  clusterFunction = c("tight", "hierarchical01"), alphas = c(0.1, 0.2, 0.3),
  betas = 0.9, minSizes = 1, combineProportion = 0.7,
  combineMinSize = 5, dendroReduce = "mad", dendroNDims = 1000,
  mergeMethod = "adjP", mergeCutoff = 0.05, verbose = FALSE,
  clusterDArgs = NULL, subsampleArgs = NULL, seqArgs = NULL, ncores = 1,
  random.seed = NULL, run = TRUE)

## S4 method for signature 'SummarizedExperiment'
RSEC(x, ...)

## S4 method for signature 'ClusterExperiment'
RSEC(x, eraseOld = FALSE, ...)

Arguments

x

the data on which to run the clustering. Can be: matrix (with genes in rows), a list of datasets overwhich the clusterings should be run, a SummarizedExperiment object, or a ClusterExperiment object.

isCount

logical. Whether the data are in counts, in which case the default transFun argument is set as log2(x+1). This is simply a convenience to the user, and can be overridden by giving an explicit function to transFun.

transFun

function A function to use to transform the input data matrix before clustering.

dimReduce

character A character identifying what type of dimensionality reduction to perform before clustering. Options are "none","PCA", "var","cv", and "mad". See transform for more details.

nVarDims

vector of the number of the most variable features to keep (when "var", "cv", or "mad" is identified in dimReduce). If NA is included, then the full dataset will also be included.

nPCADims

vector of the number of PCs to use (when 'PCA' is identified in dimReduce). If NA is included, then the full dataset will also be included.

k0s

the k0 parameter for sequential clustering (see seqCluster)

clusterFunction

function used for the clustering. Note that unlike in clusterSingle, this must be a character vector of pre-defined clustering techniques provided by clusterSingle, and can not be a user-defined function. Current functions are "tight", "hierarchical01","hierarchicalK", and "pam"

alphas

values of alpha to be tried. Only used for clusterFunctions of type '01' (either 'tight' or 'hierarchical01'). Determines tightness required in creating clusters from the dissimilarity matrix. Takes on values in [0,1]. See clusterD.

betas

values of beta to be tried in sequential steps. Only used for sequential=TRUE. Determines the similarity between two clusters required in order to deem the cluster stable. Takes on values in [0,1]. See seqCluster.

minSizes

the minimimum size required for a cluster (in clusterD). Clusters smaller than this are not kept and samples are left unassigned.

combineProportion

passed to proportion in combineMany

combineMinSize

passed to minSize in combineMany

dendroReduce

passed to dimReduce in makeDendrogram

dendroNDims

passed to ndims in makeDendrogram

mergeMethod

passed to mergeMethod in mergeClusters

mergeCutoff

passed to cutoff in mergeClusters

verbose

logical. If TRUE it will print informative messages.

clusterDArgs

list of additional arguments to be passed to clusterD.

subsampleArgs

list of arguments to be passed to subsampleClustering.

seqArgs

list of additional arguments to be passed to seqCluster.

ncores

the number of threads

random.seed

a value to set seed before each run of clusterSingle (so that all of the runs are run on the same subsample of the data). Note, if 'random.seed' is set, argument 'ncores' should NOT be passed via subsampleArgs; instead set the argument 'ncores' of clusterMany directly (which is preferred for improving speed anyway).

run

logical. If FALSE, doesn't run clustering, but just returns matrix of parameters that will be run, for the purpose of inspection by user (with rownames equal to the names of the resulting column names of clMat object that would be returned if run=TRUE). Even if run=FALSE, however, the function will create the dimensionality reductions of the data indicated by the user input.

...

For signature list, arguments to be passed on to mclapply (if ncores>1). For all the other signatures, arguments to be passed to the method for signature list.

eraseOld

logical. Only relevant if input x is of class ClusterExperiment. If TRUE, will erase existing workflow results (clusterMany as well as mergeClusters and combineMany). If FALSE, existing workflow results will have "_i" added to the clusterTypes value, where i is one more than the largest such existing workflow clusterTypes.