quickCorrect: Quickly perform batch correction
In LTLA/batchelor: Single-Cell Batch Correction Methods

quickCorrect

R Documentation

Quickly perform batch correction

Description

Quickly perform batch correction by intersecting the gene features, normalizing and log-transforming, modelling per-gene variances and identifying highly variable genes, and then applying the correction algorithm of choice.

Usage

quickCorrect(
  ...,
  batch = NULL,
  restrict = NULL,
  correct.all = TRUE,
  assay.type = "counts",
  PARAM = FastMnnParam(),
  multi.norm.args = list(),
  precomputed = NULL,
  model.var.args = list(),
  hvg.args = list(n = 5000)
)

Arguments

`...`	One or more matrix-like objects containing single-cell gene expression matrices. Alternatively, one or more SingleCellExperiment objects can be supplied. If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch. Objects of different types can be mixed together. If a single object is supplied, `batch` should also be specified.
`batch`	A factor specifying the batch of origin for each cell if only one batch is supplied in `...`. This will be ignored if two or more batches are supplied.
`restrict`	A list of length equal to the number of objects in `...`. Each entry of the list corresponds to one batch and specifies the cells to use when computing the correction.
`correct.all`	A logical scalar indicating whether to return corrected expression values for all genes, even if `subset.row` is set. Used to ensure that the output is of the same dimensionality as the input.
`assay.type`	A string or integer scalar specifying the assay to use for correction. Only used for SingleCellExperiment inputs.
`PARAM`	A BatchelorParam object specifying the batch correction method to dispatch to, and the parameters with which it should be run. ClassicMnnParam will dispatch to `mnnCorrect`; FastMnnParam will dispatch to `fastMNN`; RescaleParam will dispatch to `rescaleBatches`; and RegressParam will dispatch to `regressBatches`.
`multi.norm.args`	Named list of further arguments to pass to `multiBatchNorm`.
`precomputed`	List of DataFrames containing precomputed variance modelling results. This should be a list of the same length as the number of entries in `...`, and each should have the same row names as the corresponding entry of `...`.
`model.var.args`	Named list of further arguments to pass to `modelGeneVar`.
`hvg.args`	Named list of further arguments to pass to `getTopHVGs`. By default, we take the top 5000 genes with the highest variances.

Details

This function wraps the sequence of typical steps required to obtain corrected expression values.'

Intersecting each batch to the universe of common features with intersectRows.
Applying normalization and log-transformation to the batches with multiBatchNorm.
Modelling the per-gene variance with modelGeneVar. If precomputed is supplied, the precomputed results are used instead.
Identifying highly variable genes with getTopHVGs. These genes will be used in the correction, though corrected values for all genes can be returned by setting correct.all=TRUE.
Applying the batch correction algorithm of choice with batchCorrect, as specified by PARAM.

The default of correct.all=TRUE differs from that of other functions. This is because the subsetting to HVGs is done internally here, and we avoid surprises by returning results for all genes in the input object(s). In contrast, the other functions require explicit subsetting via subset.row= and it is expected that users will set correct.all= if all genes are desired.

Value

A list containing:

dec, a DataFrame containing the combined variance modelling results across all batches.
hvgs, a character vector of the genes selected to use in the correction.
corrected, a SingleCellExperiment containing the corrected values across all cells in all batches.

Author(s)

Aaron Lun

Examples

d1 <- matrix(rnbinom(50000, mu=10, size=1), ncol=100)
sce1 <- SingleCellExperiment(list(counts=d1))
sizeFactors(sce1) <- runif(ncol(d1))
rownames(sce1) <- paste0("GENE", 1:500)

d2 <- matrix(rnbinom(20000, mu=50, size=1), ncol=40)
sce2 <- SingleCellExperiment(list(counts=d2))
sizeFactors(sce2) <- runif(ncol(d2))
rownames(sce2) <- paste0("GENE", 201:700)

# Fire and forget:
set.seed(1000)
output <- quickCorrect(sce1, sce2) 
output$corrected

LTLA/batchelor documentation built on March 29, 2025, 11:54 p.m.