clusterSingle: General wrapper method to cluster the data

Description Usage Arguments Details Value See Also Examples

Description

Given input data, this function will find clusters, based on a single specification of parameters.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
## S4 method for signature 'SummarizedExperiment'
clusterSingle(inputMatrix, ...)

## S4 method for signature 'ClusterExperiment'
clusterSingle(inputMatrix, ...)

## S4 method for signature 'SingleCellExperiment'
clusterSingle(
  inputMatrix,
  reduceMethod = "none",
  nDims = defaultNDims(inputMatrix, reduceMethod),
  whichAssay = 1,
  ...
)

## S4 method for signature 'matrixOrHDF5OrNULL'
clusterSingle(
  inputMatrix,
  inputType = "X",
  subsample = FALSE,
  sequential = FALSE,
  distFunction = NA,
  mainClusterArgs = NULL,
  subsampleArgs = NULL,
  seqArgs = NULL,
  isCount = FALSE,
  transFun = NULL,
  reduceMethod = "none",
  nDims = defaultNDims(inputMatrix, reduceMethod),
  makeMissingDiss = if (ncol(inputMatrix) < 1000) TRUE else FALSE,
  clusterLabel = "clusterSingle",
  saveSubsamplingMatrix = FALSE,
  checkDiss = FALSE,
  warnings = TRUE
)

Arguments

inputMatrix

numerical matrix on which to run the clustering or a SummarizedExperiment, SingleCellExperiment, or ClusterExperiment object.

...

arguments to be passed on to the method for signature matrix.

reduceMethod

character A character identifying what type of dimensionality reduction to perform before clustering. Options are 1) "none", 2) one of listBuiltInReducedDims() or listBuiltInFitlerStats OR 3) stored filtering or reducedDim values in the object.

nDims

integer An integer identifying how many dimensions to reduce to in the reduction specified by reduceMethod. Defaults to output of defaultNDims

whichAssay

numeric or character specifying which assay to use. See assay for details.

inputType

a character vector defining what type of input is given in the inputMatrix argument. Must consist of values "diss","X", or "cat" (see details). "X" and "cat" should be indicate matrices with features in the row and samples in the column; "cat" corresponds to the features being numerical integers corresponding to categories, while "X" are continuous valued features. "diss" corresponds to an inputMatrix that is a NxN dissimilarity matrix. "cat" is largely used internally for clustering of sets of clusterings.

subsample

logical as to whether to subsample via subsampleClustering. If TRUE, clustering in mainClustering step is done on the co-occurance between clusterings in the subsampled clustering results. If FALSE, the mainClustering step will be run directly on x/diss

sequential

logical whether to use the sequential strategy (see details of seqCluster). Can be used in combination with subsample=TRUE or FALSE.

distFunction

a distance function to be applied to inputMatrix. Only relevant if inputType="X". See details of clusterSingle for the required format of the distance function.

mainClusterArgs

list of arguments to be passed for the mainClustering step, see help pages of mainClustering.

subsampleArgs

list of arguments to be passed to the subsampling step (if subsample=TRUE), see help pages of subsampleClustering.

seqArgs

list of arguments to be passed to seqCluster.

isCount

if transFun=NULL, then isCount=TRUE will determine the transformation as defined by function(x){log2(x+1)}, and isCount=FALSE will give a transformation function function(x){x}. Ignored if transFun=NULL. If object is of class ClusterExperiment, the stored transformation will be used and giving this parameter will result in an error.

transFun

a transformation function to be applied to the data. If the transformation applied to the data creates an error or NA values, then the function will throw an error. If object is of class ClusterExperiment, the stored transformation will be used and giving this parameter will result in an error.

makeMissingDiss

logical. Whether to calculate necessary distance matrices needed when input is not "diss". If TRUE, then when a clustering function calls for a inputType "diss", but the given matrix is of type "X", the function will calculate a distance function. A dissimilarity matrix will also be calculated if a post-processing argument like findBestK or removeSil is chosen, since these rely on calcualting silhouette widths from distances.

clusterLabel

a string used to describe the clustering. By default it is equal to "clusterSingle", to indicate that this clustering is the result of a call to clusterSingle.

saveSubsamplingMatrix

logical. If TRUE, the co-clustering matrix resulting from subsampling is returned in the coClustering slot (and replaces any existing coClustering object in the slot coClustering if input object is a ClusterExperiment object.)

checkDiss

logical. Whether to check whether the dissimilarities matrices are valid (whether given by the user or calculated because makeMissingDiss=TRUE).

warnings

logical. Whether to print out the many possible warnings and messages regarding checking the internal consistency of the parameters.

Details

clusterSingle is an 'expert-oriented' function, intended to be used when a user wants to run a single clustering and/or have a great deal of control over the clustering parameters. Most users will find clusterMany more relevant. However, clusterMany makes certain assumptions about the intention of certain combinations of parameters that might not match the user's intent; similarly clusterMany does not directly take a dissimilarity matrix but only a matrix of values x (though a user can define a distance function to be applied to x in clusterMany).

Unlike clusterMany, most of the relevant arguments for the actual clustering algorithms in clusterSingle are passed to the relevant steps via the arguments mainClusterArgs, subsampleArgs, and seqArgs. These arguments should be named lists with parameters that match the corresponding functions: mainClustering,subsampleClustering, and seqCluster. These three functions are not meant to be called by the user, but rather accessed via calls to clusterSingle. But the user can look at the help files of those functions for more information regarding the parameters that they take.

Only certain combinations of parameters are possible for certain choices of sequential and subsample. These restrictions are documented below.

To provide a distance matrix via the argument distFunction, the function must be defined to take the distance of the rows of a matrix (internally, the function will call distFunction(t(x)). This is to be compatible with the input for the dist function. as.matrix will be performed on the output of distFunction, so if the object returned has a as.matrix method that will convert the output into a symmetric matrix of distances, this is fine (for example the class dist for objects returned by dist have such a method). If distFunction=NA, then a default distance will be calculated based on the type of clustering algorithm of clusterFunction. For type "K" the default is to take dist as the distance function. For type "01", the default is to take the (1-cor(x))/2.

Value

A ClusterExperiment object if inputType is of type "X".

If input was not of type "X", then the result is a list with values

See Also

clusterMany to compare multiple choices of parameters, and mainClustering,subsampleClustering, and seqCluster for the underlying functions called by clusterSingle.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
data(simData)

## Not run: 
#following code takes some time.
#use clusterSingle to do sequential clustering
#(same as example in seqCluster only using clusterSingle ...)
set.seed(44261)
clustSeqHier_v2 <- clusterSingle(simData,
     sequential=TRUE, subsample=TRUE, 
     subsampleArgs=list(resamp.n=100, samp.p=0.7,
     clusterFunction="kmeans", clusterArgs=list(nstart=10)),
     seqArgs=list(beta=0.8, k0=5), mainClusterArgs=list(minSize=5,
     clusterFunction="hierarchical01",clusterArgs=list(alpha=0.1)))

## End(Not run)

#use clusterSingle to do just clustering k=3 with no subsampling
clustObject <- clusterSingle(simData,
    subsample=FALSE, sequential=FALSE,
    mainClusterArgs=list(clusterFunction="pam", clusterArgs=list(k=3)))
#compare to standard pam
pamOut<-cluster::pam(t(simData),k=3,cluster.only=TRUE)
all(pamOut==primaryCluster(clustObject))

clusterExperiment documentation built on Feb. 11, 2021, 2 a.m.