mainClustering: Cluster distance matrix from subsampling

Description Usage Arguments Details Value Examples

Description

Given input data, this function will try to find the clusters based on the given ClusterFunction object.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## S4 method for signature 'character'
mainClustering(clusterFunction, ...)

## S4 method for signature 'ClusterFunction'
mainClustering(clusterFunction, x = NULL,
  diss = NULL, distFunction = NA, clusterArgs = NULL, minSize = 1,
  orderBy = c("size", "best"), format = c("vector", "list"),
  checkArgs = TRUE, checkDiss = TRUE, returnData = FALSE, ...)

## S4 method for signature 'ClusterFunction'
getPostProcessingArgs(clusterFunction)

Arguments

clusterFunction

a ClusterFunction object that defines the clustering routine. See ClusterFunction for required format of user-defined clustering routines. User can also give a character value to the argument clusterFunction to indicate the use of clustering routines provided in package. Type listBuiltInFunctions at command prompt to see the built-in clustering routines. If clusterFunction is missing, the default is set to "pam".

...

arguments passed to the post-processing steps of the clustering. The available post-processing arguments for a ClusterFunction object depend on it's algorithm type and can be found by calling getPostProcessingArgs. See details below for documentation.

x

p x n data matrix on which to run the clustering (samples in columns).

diss

n x n data matrix of dissimilarities between the samples on which to run the clustering

distFunction

a distance function to be applied to D. Only relevant if input is only x (a matrix of data), and diss=NULL. See details of clusterSingle for the required format of the distance function.

clusterArgs

arguments to be passed directly to the clusterFUN slot of the ClusterFunction object

minSize

the minimum number of samples in a cluster. Clusters found below this size will be discarded and samples in the cluster will be given a cluster assignment of "-1" to indicate that they were not clustered.

orderBy

how to order the cluster (either by size or by maximum alpha value). If orderBy="size" the numbering of the clusters are reordered by the size of the cluster, instead of by the internal ordering of the clusterFUN defined in the ClusterFunction object (an internal ordering is only possible if slot outputType of the ClusterFunction is "list").

format

whether to return a list of indices in a cluster or a vector of clustering assignments. List is mainly for compatibility with sequential part.

checkArgs

logical as to whether should give warning if arguments given that don't match clustering choices given. Otherwise, inapplicable arguments will be ignored without warning.

checkDiss

logical. Whether to check whether the input diss is valid.

returnData

logical as to whether to return the diss or x matrix in the output. If FALSE only the clustering vector is returned.

Details

mainClustering is not meant to be called by the user. It is only an exported function so as to be able to clearly document the arguments for mainClustering which can be passed via the argument mainClusterArgs in functions like clusterSingle and clusterMany.

Post-processing Arguments: For post-processing the clustering, currently only type 'K' algorithms have a defined post-processing. Specifically

Value

mainClustering returns a vector of cluster assignments (if format="vector") or a list of indices for each cluster (if format="list"). Clusters less than minSize are removed.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
data(simData)
cl1<-mainClustering(x=simData,clusterFunction="pam",clusterArgs=list(k=3))
cl2<-mainClustering(simData,clusterFunction="hierarchical01",clusterArgs=list(alpha=.1))
cl3<-mainClustering(simData,clusterFunction="tight",clusterArgs=list(alpha=.1))
#change distance to manhattan distance
cl4<-mainClustering(simData,clusterFunction="pam",clusterArgs=list(k=3),
     distFunction=function(x){dist(x,method="manhattan")})

#run hierarchical method for finding blocks, with method of evaluating
#coherence of block set to evalClusterMethod="average", and the hierarchical
#clustering using single linkage:
clustSubHier <- mainClustering(simData, clusterFunction="hierarchical01",
minSize=5, clusterArgs=list(alpha=0.1,evalClusterMethod="average", method="single"))

#do tight
clustSubTight <- mainClustering(simData, clusterFunction="tight", clusterArgs=list(alpha=0.1),
minSize=5)

#two twists to pam
clustSubPamK <- mainClustering(simData, clusterFunction="pam", silCutoff=0, minSize=5,
removeSil=TRUE, clusterArgs=list(k=3))
clustSubPamBestK <- mainClustering(simData, clusterFunction="pam", silCutoff=0,
minSize=5, removeSil=TRUE, findBestK=TRUE, kRange=2:10)

# note that passing the wrong arguments for an algorithm results in warnings
# (which can be turned off with checkArgs=FALSE)
clustSubTight_test <- mainClustering(simData, clusterFunction="tight",
clusterArgs=list(alpha=0.1), minSize=5, removeSil=TRUE)
clustSubTight_test2 <- mainClustering(simData, clusterFunction="tight",
clusterArgs=list(alpha=0.1,evalClusterMethod="average"))

Bioconductor-mirror/clusterExperiment documentation built on Aug. 2, 2017, 4:28 p.m.