subsampleClustering: Cluster subsamples of the data

Description Usage Arguments Details Value Examples

Description

Given input data, this function will subsample the samples, cluster the subsamples, and return a n x n matrix with the probability of co-occurance.

Usage

1
2
3
4
5
6
7
8
## S4 method for signature 'character'
subsampleClustering(clusterFunction, ...)

## S4 method for signature 'ClusterFunction'
subsampleClustering(clusterFunction, x = NULL,
  diss = NULL, distFunction = NA, clusterArgs = NULL,
  classifyMethod = c("All", "InSample", "OutOfSample"), resamp.num = 100,
  samp.p = 0.7, ncores = 1, checkArgs = TRUE, checkDiss = TRUE, ...)

Arguments

clusterFunction

a ClusterFunction object that defines the clustering routine. See ClusterFunction for required format of user-defined clustering routines. User can also give a character value to the argument clusterFunction to indicate the use of clustering routines provided in package. Type listBuiltInFunctions at command prompt to see the built-in clustering routines. If clusterFunction is missing, the default is set to "pam".

...

arguments passed to mclapply (if ncores>1).

x

the data on which to run the clustering (samples in columns).

diss

a dissimilarity matrix on which to run the clustering.

distFunction

a distance function to be applied to D. Only relevant if input is only x (a matrix of data), and diss=NULL. See details of clusterSingle for the required format of the distance function.

clusterArgs

a list of parameter arguments to be passed to the function defined in the clusterFunction slot of the ClusterFunction object. For any given ClusterFunction object, use function requiredArgs to get a list of required arguments for the object.

classifyMethod

method for determining which samples should be used in calculating the co-occurance matrix. "All"= all samples, "OutOfSample"= those not subsampled, and "InSample"=those in the subsample. See details for explanation.

resamp.num

the number of subsamples to draw.

samp.p

the proportion of samples to sample for each subsample.

ncores

integer giving the number of cores. If ncores>1, mclapply will be called.

checkArgs

logical as to whether should give warning if arguments given that don't match clustering choices given. Otherwise, inapplicable arguments will be ignored without warning.

checkDiss

logical. Whether to check whether the input diss is valid.

Details

subsampleClustering is not usually called directly by the user. It is only an exported function so as to be able to clearly document the arguments for subsampleClustering which can be passed via the argument subsampleArgs in functions like clusterSingle and clusterMany.

requiredArgs: The choice of "All" or "OutOfSample" for requiredArgs require the classification of arbitrary samples not originally in the clustering to clusters; this is done via the classifyFUN provided in the ClusterFunction object. If the ClusterFunction object does not have such a function to define how to classify into a cluster samples not in the subsample that created the clustering then classifyMethod must be "InSample". Note that if "All" is chosen, all samples will be classified into clusters via the classifyFUN, not just those that are out-of-sample; this could result in different assignments to clusters for the in-sample samples than their original assignment by the clustering depending on the classification function. If you do not choose 'All',it is possible to get NAs in resulting S matrix (particularly if when not enough subsamples are taken) which can cause errors if you then pass the resulting D=1-S matrix to mainClustering. For this reason the default is "All".

Value

A n x n matrix of co-occurances, i.e. a symmetric matrix with [i,j] entries equal to the percentage of subsamples where the ith and jth sample were clustered into the same cluster. The percentage is only out of those subsamples where the ith and jth samples were both assigned to a clustering. If classifyMethod=="All", this is all subsamples for all i,j pairs. But if classifyMethod=="InSample" or classifyMethod=="OutOfSample", then the percentage is only taken on those subsamples where the ith and jth sample were both in or out of sample, respectively, relative to the subsample.

Examples

1
2
3
4
5
6
data(simData)
coOccur <- subsampleClustering(clusterFunction="kmeans", x=simData, 
clusterArgs=list(k=3,nstart=10), resamp.n=100, samp.p=0.7)

#visualize the resulting co-occurance matrix
plotHeatmap(coOccur)

Bioconductor-mirror/clusterExperiment documentation built on Aug. 2, 2017, 4:28 p.m.