mainClustering: Cluster distance matrix from subsampling
In epurdom/clusterCells: Compare Clusterings for Single-Cell Sequencing

mainClustering

R Documentation

Cluster distance matrix from subsampling

Description

Given input data, this function will try to find the clusters based on the given ClusterFunction object.

Usage

## S4 method for signature 'character'
mainClustering(clusterFunction, ...)

## S4 method for signature 'ClusterFunction'
mainClustering(
  clusterFunction,
  inputMatrix,
  inputType,
  clusterArgs = NULL,
  minSize = 1,
  orderBy = c("size", "best"),
  format = c("vector", "list"),
  returnData = FALSE,
  warnings = TRUE,
  ...
)

## S4 method for signature 'ClusterFunction'
getPostProcessingArgs(clusterFunction)

Arguments

`clusterFunction`	a `ClusterFunction` object that defines the clustering routine. See `ClusterFunction` for required format of user-defined clustering routines. User can also give a character value to the argument `clusterFunction` to indicate the use of clustering routines provided in package. Type `listBuiltInFunctions` at command prompt to see the built-in clustering routines. If `clusterFunction` is missing, the default is set to "pam".
`...`	arguments passed to the post-processing steps of the clustering. The available post-processing arguments for a `ClusterFunction` object depend on it's algorithm type and can be found by calling `getPostProcessingArgs`. See details below for documentation.
`inputMatrix`	numerical matrix on which to run the clustering or a `SummarizedExperiment`, `SingleCellExperiment`, or `ClusterExperiment` object.
`inputType`	a character vector defining what type of input is given in the `inputMatrix` argument. Must consist of values "diss","X", or "cat" (see details). "X" and "cat" should be indicate matrices with features in the row and samples in the column; "cat" corresponds to the features being numerical integers corresponding to categories, while "X" are continuous valued features. "diss" corresponds to an `inputMatrix` that is a NxN dissimilarity matrix. "cat" is largely used internally for clustering of sets of clusterings.
`clusterArgs`	arguments to be passed directly to the `clusterFUN` slot of the `ClusterFunction` object
`minSize`	the minimum number of samples in a cluster. Clusters found below this size will be discarded and samples in the cluster will be given a cluster assignment of "-1" to indicate that they were not clustered.
`orderBy`	how to order the cluster (either by size or by maximum alpha value). If orderBy="size" the numbering of the clusters are reordered by the size of the cluster, instead of by the internal ordering of the `clusterFUN` defined in the `ClusterFunction` object (an internal ordering is only possible if slot `outputType` of the `ClusterFunction` is `"list"`).
`format`	whether to return a list of indices in a cluster or a vector of clustering assignments. List is mainly for compatibility with sequential part.
`returnData`	logical as to whether to return the `diss` or `x` matrix in the output. If `FALSE` only the clustering vector is returned.
`warnings`	logical as to whether should give warning if arguments given that don't match clustering choices given. Otherwise, inapplicable arguments will be ignored without warning.

Details

mainClustering is not meant to be called by the user. It is only an exported function so as to be able to clearly document the arguments for mainClustering which can be passed via the argument mainClusterArgs in functions like clusterSingle and clusterMany.

Post-processing Arguments: For post-processing the clustering, currently only type 'K' algorithms have a defined post-processing. Specifically

"findBestK"logical, whether should find best K based on average silhouette width (only used if clusterFunction of type "K").
"kRange"vector of integers to try for k values if findBestK=TRUE. If k is given in clusterArgs, then default is k-2 to k+20, subject to those values being greater than 2; if not the default is 2:20. Note that default values depend on the input k, so running for different choices of k and findBestK=TRUE can give different answers unless kRange is set to be the same.
"removeSil"logical as to whether remove the assignment of a sample to a cluster when the sample's silhouette value is less than silCutoff
"silCutoff"Cutoff on the minimum silhouette width to be included in cluster (only used if removeSil=TRUE).

Value

If returnData=FALSE, mainClustering returns a vector of cluster assignments (if format="vector") or a list of indices for each cluster (if format="list"). Clusters less than minSize are removed. If returnData=TRUE, then mainClustering returns a list

resultsThe clusterings of each sample.
inputMatrixThe input matrix given to argument inputMatrix. Useful if input is result of subsampling, in which case input is the set of clusterings found over subsampling.

Examples

data(simData)
cl1<-mainClustering(inputMatrix=simData, inputType="X", 
    clusterFunction="pam",clusterArgs=list(k=3))
#supply a dissimilarity, use algorithm type "01"
diss<-as.matrix(dist(t(simData),method="manhattan"))
cl2<-mainClustering(diss, inputType="diss", clusterFunction="hierarchical01",
    clusterArgs=list(alpha=.1))
cl3<-mainClustering(inputMatrix=diss, inputType="diss", clusterFunction="pam",
    clusterArgs=list(k=3))

# run hierarchical method for finding blocks, with method of evaluating
# coherence of block set to evalClusterMethod="average", and the hierarchical
# clustering using single linkage:
# (clustering function requires type 'diss'),
clustSubHier <- mainClustering(diss, inputType="diss",
    clusterFunction="hierarchical01", minSize=5,
    clusterArgs=list(alpha=0.1,evalClusterMethod="average", method="single"))

#post-process results of pam -- must pass diss for silhouette calculation
clustSubPamK <- mainClustering(simData, inputType="X", clusterFunction="pam", 
    silCutoff=0, minSize=5, diss=diss, removeSil=TRUE, clusterArgs=list(k=3))
clustSubPamBestK <- mainClustering(simData, inputType="X", clusterFunction="pam", silCutoff=0,
    minSize=5, diss=diss, removeSil=TRUE, findBestK=TRUE, kRange=2:10)

# note that passing the wrong arguments for an algorithm results in warnings
# (which can be turned off with warnings=FALSE)
clustSubTight_test <- mainClustering(diss, inputType="diss", 
   clusterFunction="tight", 
   clusterArgs=list(alpha=0.1), minSize=5, removeSil=TRUE)
clustSubTight_test2 <- mainClustering(diss, inputType="diss",
   clusterFunction="tight",
   clusterArgs=list(alpha=0.1,evalClusterMethod="average"))

epurdom/clusterCells documentation built on April 28, 2024, 8:14 p.m.