RSEC: Resampling-based Sequential Ensemble Clustering

Description Usage Arguments Details Value

Description

Implementation of the RSEC algorithm (Resampling-based Sequential Ensemble Clustering) for single cell sequencing data. This is a wrapper function around the existing ClusterExperiment workflow that results in the output of RSEC.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
## S4 method for signature 'SummarizedExperiment'
RSEC(x, ...)

## S4 method for signature 'data.frame'
RSEC(x, ...)

## S4 method for signature 'ClusterExperiment'
RSEC(x, eraseOld = FALSE, rerunClusterMany = FALSE, ...)

## S4 method for signature 'matrixOrHDF5'
RSEC(x, ...)

## S4 method for signature 'SingleCellExperiment'
RSEC(
  x,
  isCount = FALSE,
  transFun = NULL,
  reduceMethod = "PCA",
  nFilterDims = defaultNDims(x, reduceMethod, type = "filterStats"),
  nReducedDims = defaultNDims(x, reduceMethod, type = "reducedDims"),
  k0s = 4:15,
  subsample = TRUE,
  sequential = TRUE,
  clusterFunction = "hierarchical01",
  alphas = c(0.1, 0.2, 0.3),
  betas = 0.9,
  minSizes = 1,
  makeMissingDiss = if (ncol(x) < 1000) TRUE else FALSE,
  consensusProportion = 0.7,
  consensusMinSize,
  dendroReduce,
  dendroNDims,
  mergeMethod = "adjP",
  mergeCutoff,
  mergeLogFCcutoff,
  mergeDEMethod = if (isCount) "limma-voom" else "limma",
  verbose = FALSE,
  parameterWarnings = FALSE,
  mainClusterArgs = NULL,
  subsampleArgs = NULL,
  seqArgs = NULL,
  consensusArgs = NULL,
  whichAssay = 1,
  ncores = 1,
  random.seed = NULL,
  stopOnErrors = FALSE,
  run = TRUE
)

Arguments

x

the data matrix on which to run the clustering. Can be object of the following classes: matrix (with genes in rows), SummarizedExperiment, SingleCellExperiment or ClusterExperiment.

...

For signature matrix, arguments to be passed on to mclapply (if ncores>1). For all the other signatures, arguments to be passed to the method for signature matrix.

eraseOld

logical. Only relevant if input x is of class ClusterExperiment. If TRUE, will erase existing workflow results (clusterMany as well as mergeClusters and makeConsensus). If FALSE, existing workflow results will have "_i" added to the clusterTypes value, where i is one more than the largest such existing workflow clusterTypes.

rerunClusterMany

logical. If the object is a ClusterExperiment object, determines whether to rerun the clusterMany step. Useful if want to try different parameters for combining clusters after the clusterMany step, without the computational costs of the clusterMany step.

isCount

if transFun=NULL, then isCount=TRUE will determine the transformation as defined by function(x){log2(x+1)}, and isCount=FALSE will give a transformation function function(x){x}. Ignored if transFun=NULL. If object is of class ClusterExperiment, the stored transformation will be used and giving this parameter will result in an error.

transFun

a transformation function to be applied to the data. If the transformation applied to the data creates an error or NA values, then the function will throw an error. If object is of class ClusterExperiment, the stored transformation will be used and giving this parameter will result in an error.

reduceMethod

character A character identifying what type of dimensionality reduction to perform before clustering. Options are 1) "none", 2) one of listBuiltInReducedDims() or listBuiltInFitlerStats OR 3) stored filtering or reducedDim values in the object.

nFilterDims

vector of the number of the most variable features to keep (when "var", "abscv", or "mad" is identified in reduceMethod).

nReducedDims

vector of the number of dimensions to use (when reduceMethod gives a dimensionality reduction method).

k0s

the k0 parameter for sequential clustering (see seqCluster)

subsample

logical as to whether to subsample via subsampleClustering. If TRUE, clustering in mainClustering step is done on the co-occurance between clusterings in the subsampled clustering results. If FALSE, the mainClustering step will be run directly on x/diss

sequential

logical whether to use the sequential strategy (see details of seqCluster). Can be used in combination with subsample=TRUE or FALSE.

clusterFunction

function used for the clustering. This must be either 1) a character vector of built-in clustering techniques, or 2) a named list of ClusterFunction objects. Current functions can be found by typing listBuiltInFunctions() into the command-line.

alphas

values of alpha to be tried. Only used for clusterFunctions of type '01'. Determines tightness required in creating clusters from the dissimilarity matrix. Takes on values in [0,1]. See documentation of ClusterFunction.

betas

values of beta to be tried in sequential steps. Only used for sequential=TRUE. Determines the similarity between two clusters required in order to deem the cluster stable. Takes on values in [0,1]. See documentation of seqCluster.

minSizes

the minimimum size required for a cluster (in the mainClustering step). Clusters smaller than this are not kept and samples are left unassigned.

makeMissingDiss

logical. Whether to calculate necessary distance matrices needed when input is not "diss". If TRUE, then when a clustering function calls for a inputType "diss", but the given matrix is of type "X", the function will calculate a distance function. A dissimilarity matrix will also be calculated if a post-processing argument like findBestK or removeSil is chosen, since these rely on calcualting silhouette widths from distances.

consensusProportion

passed to proportion in makeConsensus

consensusMinSize

passed to minSize in makeConsensus

dendroReduce

passed to reduceMethod in makeDendrogram

dendroNDims

passed to nDims in makeDendrogram

mergeMethod

passed to mergeMethod in mergeClusters

mergeCutoff

passed to cutoff in mergeClusters

mergeLogFCcutoff

passed to logFCcutoff in mergeClusters

mergeDEMethod

passed to DEMethod argument in mergeClusters. By default, unless otherwise chosen by the user, if isCount=TRUE, then mergeDEMethod="limma-voom", otherwise mergeDEMethod="limma". These choices are for speed considerations and the user may want to try mergeDEMethod="edgeR" on smaller datasets of counts.

verbose

logical. If TRUE it will print informative messages.

parameterWarnings

logical, as to whether warnings and comments from checking the validity of the parameter combinations should be printed.

mainClusterArgs

list of arguments to be passed for the mainClustering step, see help pages of mainClustering.

subsampleArgs

list of arguments to be passed to the subsampling step (if subsample=TRUE), see help pages of subsampleClustering.

seqArgs

list of arguments to be passed to seqCluster.

consensusArgs

list of additional arguments to be passed to makeConsensus

whichAssay

numeric or character specifying which assay to use. See assay for details.

ncores

the number of threads

random.seed

a value to set seed before each run of clusterSingle (so that all of the runs are run on the same subsample of the data). Note, if 'random.seed' is set, argument 'ncores' should NOT be passed via subsampleArgs; instead set the argument 'ncores' of clusterMany directly (which is preferred for improving speed anyway).

stopOnErrors

logical. If FALSE, if RSEC hits an error after the clusterMany step, it will return the results up to that point, rather than generating a stop error. The text of error will be printed as a NOTE. This allows the user to get the results to that point, so as to not have to rerun the computationally heavy earlier steps. The TRUE option is only provided for debugging purposes.

run

logical. If FALSE, doesn't run clustering, but just returns matrix of parameters that will be run, for the purpose of inspection by user (with rownames equal to the names of the resulting column names of clMat object that would be returned if run=TRUE). Even if run=FALSE, however, the function will create the dimensionality reductions of the data indicated by the user input.

Details

Note that the argument isCount is mainly used when the input is a matrix or SingleCellExperiment Class and passed to clusterMany to set the transformation function of the data. However, if RSEC is being re-called on an existing ClusterExperiment object, it does not reset the transformation; in this case the only impact it will have is in setting the default value for DEMethod for mergeClusters step, but ONLY if mergeClusters hasn't already been calculated. To set arguments that allow you to recalculate the non-null probabilities of the hierarchy see mergeClusters.

Value

A ClusterExperiment object is returned containing all of the clusterings from the steps of RSEC


clusterExperiment documentation built on Feb. 11, 2021, 2 a.m.