quickSubCluster: Quick and dirty subclustering
In MarioniLab/scran: Methods for Single-Cell RNA-Seq Data Analysis

quickSubCluster

R Documentation

Quick and dirty subclustering

Description

Performs a quick subclustering for all cells within each group.

Usage

quickSubCluster(x, ...)

## S4 method for signature 'ANY'
quickSubCluster(x, normalize = TRUE, ...)

## S4 method for signature 'SummarizedExperiment'
quickSubCluster(x, ...)

## S4 method for signature 'SingleCellExperiment'
quickSubCluster(
  x,
  groups,
  normalize = TRUE,
  restricted = NULL,
  prepFUN = NULL,
  min.ncells = 50,
  clusterFUN = NULL,
  BLUSPARAM = NNGraphParam(),
  format = "%s.%s",
  assay.type = "counts",
  simplify = FALSE
)

Arguments

`x`	A matrix of counts or log-normalized expression values (if `normalize=FALSE`), where each row corresponds to a gene and each column corresponds to a cell. Alternatively, a SummarizedExperiment or SingleCellExperiment object containing such a matrix.
`...`	For the generic, further arguments to pass to specific methods. For the ANY and SummarizedExperiment methods, further arguments to pass to the SingleCellExperiment method.
`normalize`	Logical scalar indicating whether each subset of `x` should be log-transformed prior to further analysis.
`groups`	A vector of group assignments for all cells, usually corresponding to cluster identities.
`restricted`	Character vector containing the subset of groups in `groups` to be subclustered. By default, all unique groups in `groups` are used for subclustering, but this can be restricted to specific groups of interest to save compute time.
`prepFUN`	A function that accepts a single SingleCellExperiment object and returns another SingleCellExperiment containing any additional elements required for clustering (e.g., PCA results).
`min.ncells`	An integer scalar specifying the minimum number of cells in a group to be considered for subclustering.
`clusterFUN`	A function that accepts a single SingleCellExperiment object and returns a vector of cluster assignments for each cell in that object.
`BLUSPARAM`	A BlusterParam object that is used to specify the clustering via `clusterRows`. Only used when `clusterFUN=NULL`.
`format`	A string to be passed to `sprintf`, specifying how the subclusters should be named with respect to the parent level in `groups` and the level returned by `clusterFUN`.
`assay.type`	String or integer scalar specifying the relevant assay.
`simplify`	Logical scalar indicating whether just the subcluster assignments should be returned.

Details

quickSubCluster is a simple convenience function that loops over all levels of groups to perform subclustering. It subsets x to retain all cells in one level and then runs prepFUN and clusterFUN to cluster them. Levels with fewer than min.ncells are not subclustered and have "subcluster" set to the name of the level.

The distinction between prepFUN and clusterFUN is that the former's calculations are preserved in the output. For example, we would put the PCA in prepFUN so that the PCs are returned in the reducedDims for later use. In contrast, clusterFUN is only used to obtain the subcluster assignments so any intermediate objects are lost.

By default, prepFUN will run modelGeneVar, take the top 10 clusterFUN will then perform clustering on the PC matrix with clusterRows and BLUSPARAM. Either or both of these functions can be replaced with custom functions.

The default behavior of this function is the same as running quickCluster on each subset with default parameters except for min.size=0.

Value

By default, a named List of SingleCellExperiment objects. Each object corresponds to a level of groups and contains a "subcluster" column metadata field with the subcluster identities for each cell. The metadata of the List also contains index, a list of integer vectors specifying the cells in x in each returned SingleCellExperiment object; and subcluster, a character vector of subcluster identities (see next). If restricted is not NULL, only the specified groups in restricted will be present in the output List.

If simplify=TRUE, the character vector of subcluster identities is returned. This is of length equal to ncol(x) and each entry follows the format defined in format. The only exceptions are if the number of cells in the parent cluster is less than min.cells, or parent cluster is not listed in a non-NULL value of restricted. In such cases, the parent cluster's name is used instead.

Author(s)

Aaron Lun

Examples

library(scuttle)
sce <- mockSCE(ncells=200)

# Lowering min.size for this small demo:
clusters <- quickCluster(sce, min.size=50)

# Getting subclusters:
out <- quickSubCluster(sce, clusters)

# Defining custom prep functions:
out2 <- quickSubCluster(sce, clusters, 
    prepFUN=function(x) {
        dec <- modelGeneVarWithSpikes(x, "Spikes")
        top <- getTopHVGs(dec, prop=0.2)
        scater::runPCA(x, subset_row=top, ncomponents=25)
    }
)

# Defining custom cluster functions:
out3 <- quickSubCluster(sce, clusters, 
    clusterFUN=function(x) {
        kmeans(reducedDim(x, "PCA"), sqrt(ncol(x)))$cluster
    }
)

MarioniLab/scran documentation built on Sept. 7, 2024, 6:25 a.m.