mainClustering: Cluster distance matrix from subsampling

mainClusteringR Documentation

Cluster distance matrix from subsampling

Description

Given input data, this function will try to find the clusters based on the given ClusterFunction object.

Usage

## S4 method for signature 'character'
mainClustering(clusterFunction, ...)

## S4 method for signature 'ClusterFunction'
mainClustering(
  clusterFunction,
  inputMatrix,
  inputType,
  clusterArgs = NULL,
  minSize = 1,
  orderBy = c("size", "best"),
  format = c("vector", "list"),
  returnData = FALSE,
  warnings = TRUE,
  ...
)

## S4 method for signature 'ClusterFunction'
getPostProcessingArgs(clusterFunction)

Arguments

clusterFunction

a ClusterFunction object that defines the clustering routine. See ClusterFunction for required format of user-defined clustering routines. User can also give a character value to the argument clusterFunction to indicate the use of clustering routines provided in package. Type listBuiltInFunctions at command prompt to see the built-in clustering routines. If clusterFunction is missing, the default is set to "pam".

...

arguments passed to the post-processing steps of the clustering. The available post-processing arguments for a ClusterFunction object depend on it's algorithm type and can be found by calling getPostProcessingArgs. See details below for documentation.

inputMatrix

numerical matrix on which to run the clustering or a SummarizedExperiment, SingleCellExperiment, or ClusterExperiment object.

inputType

a character vector defining what type of input is given in the inputMatrix argument. Must consist of values "diss","X", or "cat" (see details). "X" and "cat" should be indicate matrices with features in the row and samples in the column; "cat" corresponds to the features being numerical integers corresponding to categories, while "X" are continuous valued features. "diss" corresponds to an inputMatrix that is a NxN dissimilarity matrix. "cat" is largely used internally for clustering of sets of clusterings.

clusterArgs

arguments to be passed directly to the clusterFUN slot of the ClusterFunction object

minSize

the minimum number of samples in a cluster. Clusters found below this size will be discarded and samples in the cluster will be given a cluster assignment of "-1" to indicate that they were not clustered.

orderBy

how to order the cluster (either by size or by maximum alpha value). If orderBy="size" the numbering of the clusters are reordered by the size of the cluster, instead of by the internal ordering of the clusterFUN defined in the ClusterFunction object (an internal ordering is only possible if slot outputType of the ClusterFunction is "list").

format

whether to return a list of indices in a cluster or a vector of clustering assignments. List is mainly for compatibility with sequential part.

returnData

logical as to whether to return the diss or x matrix in the output. If FALSE only the clustering vector is returned.

warnings

logical as to whether should give warning if arguments given that don't match clustering choices given. Otherwise, inapplicable arguments will be ignored without warning.

Details

mainClustering is not meant to be called by the user. It is only an exported function so as to be able to clearly document the arguments for mainClustering which can be passed via the argument mainClusterArgs in functions like clusterSingle and clusterMany.

Post-processing Arguments: For post-processing the clustering, currently only type 'K' algorithms have a defined post-processing. Specifically

  • "findBestK"logical, whether should find best K based on average silhouette width (only used if clusterFunction of type "K").

  • "kRange"vector of integers to try for k values if findBestK=TRUE. If k is given in clusterArgs, then default is k-2 to k+20, subject to those values being greater than 2; if not the default is 2:20. Note that default values depend on the input k, so running for different choices of k and findBestK=TRUE can give different answers unless kRange is set to be the same.

  • "removeSil"logical as to whether remove the assignment of a sample to a cluster when the sample's silhouette value is less than silCutoff

  • "silCutoff"Cutoff on the minimum silhouette width to be included in cluster (only used if removeSil=TRUE).

Value

If returnData=FALSE, mainClustering returns a vector of cluster assignments (if format="vector") or a list of indices for each cluster (if format="list"). Clusters less than minSize are removed. If returnData=TRUE, then mainClustering returns a list

  • resultsThe clusterings of each sample.

  • inputMatrixThe input matrix given to argument inputMatrix. Useful if input is result of subsampling, in which case input is the set of clusterings found over subsampling.

Examples

data(simData)
cl1<-mainClustering(inputMatrix=simData, inputType="X", 
    clusterFunction="pam",clusterArgs=list(k=3))
#supply a dissimilarity, use algorithm type "01"
diss<-as.matrix(dist(t(simData),method="manhattan"))
cl2<-mainClustering(diss, inputType="diss", clusterFunction="hierarchical01",
    clusterArgs=list(alpha=.1))
cl3<-mainClustering(inputMatrix=diss, inputType="diss", clusterFunction="pam",
    clusterArgs=list(k=3))

# run hierarchical method for finding blocks, with method of evaluating
# coherence of block set to evalClusterMethod="average", and the hierarchical
# clustering using single linkage:
# (clustering function requires type 'diss'),
clustSubHier <- mainClustering(diss, inputType="diss",
    clusterFunction="hierarchical01", minSize=5,
    clusterArgs=list(alpha=0.1,evalClusterMethod="average", method="single"))

#post-process results of pam -- must pass diss for silhouette calculation
clustSubPamK <- mainClustering(simData, inputType="X", clusterFunction="pam", 
    silCutoff=0, minSize=5, diss=diss, removeSil=TRUE, clusterArgs=list(k=3))
clustSubPamBestK <- mainClustering(simData, inputType="X", clusterFunction="pam", silCutoff=0,
    minSize=5, diss=diss, removeSil=TRUE, findBestK=TRUE, kRange=2:10)

# note that passing the wrong arguments for an algorithm results in warnings
# (which can be turned off with warnings=FALSE)
clustSubTight_test <- mainClustering(diss, inputType="diss", 
   clusterFunction="tight", 
   clusterArgs=list(alpha=0.1), minSize=5, removeSil=TRUE)
clustSubTight_test2 <- mainClustering(diss, inputType="diss",
   clusterFunction="tight",
   clusterArgs=list(alpha=0.1,evalClusterMethod="average"))

epurdom/clusterCells documentation built on April 28, 2024, 8:14 p.m.