subsampleClustering: Cluster subsamples of the data

subsampleClusteringR Documentation

Cluster subsamples of the data

Description

Given input data, this function will subsample the samples, cluster the subsamples, and return a n x n matrix with the probability of co-occurance.

Usage

## S4 method for signature 'character'
subsampleClustering(clusterFunction, ...)

## S4 method for signature 'ClusterFunction'
subsampleClustering(
  clusterFunction,
  inputMatrix,
  inputType,
  clusterArgs = NULL,
  classifyMethod = c("All", "InSample", "OutOfSample"),
  resamp.num = 100,
  samp.p = 0.7,
  ncores = 1,
  warnings = TRUE,
  ...
)

Arguments

clusterFunction

a ClusterFunction object that defines the clustering routine. See ClusterFunction for required format of user-defined clustering routines. User can also give a character value to the argument clusterFunction to indicate the use of clustering routines provided in package. Type listBuiltInFunctions at command prompt to see the built-in clustering routines. If clusterFunction is missing, the default is set to "pam".

...

arguments passed to mclapply (if ncores>1).

inputMatrix

numerical matrix on which to run the clustering or a SummarizedExperiment, SingleCellExperiment, or ClusterExperiment object.

inputType

a character vector defining what type of input is given in the inputMatrix argument. Must consist of values "diss","X", or "cat" (see details). "X" and "cat" should be indicate matrices with features in the row and samples in the column; "cat" corresponds to the features being numerical integers corresponding to categories, while "X" are continuous valued features. "diss" corresponds to an inputMatrix that is a NxN dissimilarity matrix. "cat" is largely used internally for clustering of sets of clusterings.

clusterArgs

a list of parameter arguments to be passed to the function defined in the clusterFunction slot of the ClusterFunction object. For any given ClusterFunction object, use function requiredArgs to get a list of required arguments for the object.

classifyMethod

method for determining which samples should be used in calculating the co-occurance matrix. "All"= all samples, "OutOfSample"= those not subsampled, and "InSample"=those in the subsample. See details for explanation.

resamp.num

the number of subsamples to draw.

samp.p

the proportion of samples to sample for each subsample.

ncores

integer giving the number of cores. If ncores>1, mclapply will be called.

warnings

logical as to whether should give warning if arguments given that don't match clustering choices given. Otherwise, inapplicable arguments will be ignored without warning.

Details

subsampleClustering is not usually called directly by the user. It is only an exported function so as to be able to clearly document the arguments for subsampleClustering which can be passed via the argument subsampleArgs in functions like clusterSingle and clusterMany.

requiredArgs: The choice of "All" or "OutOfSample" for requiredArgs require the classification of arbitrary samples not originally in the clustering to clusters; this is done via the classifyFUN provided in the ClusterFunction object. If the ClusterFunction object does not have such a function to define how to classify into a cluster samples not in the subsample that created the clustering then classifyMethod must be "InSample". Note that if "All" is chosen, all samples will be classified into clusters via the classifyFUN, not just those that are out-of-sample; this could result in different assignments to clusters for the in-sample samples than their original assignment by the clustering depending on the classification function. If you do not choose 'All',it is possible to get NAs in resulting S matrix (particularly if when not enough subsamples are taken) which can cause errors if you then pass the resulting D=1-S matrix to mainClustering. For this reason the default is "All".

Value

A n x n matrix of co-occurances, i.e. a symmetric matrix with [i,j] entries equal to the percentage of subsamples where the ith and jth sample were clustered into the same cluster. The percentage is only out of those subsamples where the ith and jth samples were both assigned to a clustering. If classifyMethod=="All", this is all subsamples for all i,j pairs. But if classifyMethod=="InSample" or classifyMethod=="OutOfSample", then the percentage is only taken on those subsamples where the ith and jth sample were both in or out of sample, respectively, relative to the subsample.

Examples

## Not run: 
#takes a bit of time, not run on checks:
data(simData)
coOccur <- subsampleClustering( inputMatrix=simData, inputType="X",
clusterFunction="kmeans",
clusterArgs=list(k=3,nstart=10), resamp.n=100, samp.p=0.7)

#visualize the resulting co-occurance matrix
plotHeatmap(coOccur)

## End(Not run)

epurdom/clusterExperiment documentation built on Oct. 12, 2022, 5:27 a.m.