Fit a linear model to each gene regress out uninteresting factors of variation, returning a matrix of residuals.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
One or more log-expression matrices where genes correspond to rows and cells correspond to columns.
Alternatively, one or more SingleCellExperiment objects can be supplied containing a log-expression matrix in the
If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch.
If a single object is supplied, it is assumed to contain cells from all batches, so
Alternatively, one or more lists of matrices or SingleCellExperiments can be provided;
this is flattened as if the objects inside each list were passed directly to
A factor specifying the batch of origin for all cells when only a single object is supplied in
A numeric design matrix with number of rows equal to the total number of cells,
specifying the experimental factors to remove.
Each row corresponds to a cell in the order supplied in
Integer vector specifying the coefficients of
A list of length equal to the number of objects in
A vector specifying which features to use for correction.
Logical scalar indicating whether corrected expression values should be computed for genes not in
A string or integer scalar specifying the assay containing the log-expression values. Only used for SingleCellExperiment inputs.
Numeric scalar specifying the number of dimensions to use for PCA via
A BiocSingularParam object specifying the algorithm to use for PCA in
Logical scalar indicating whether to defer centering/scaling, see
A BiocParallelParam object specifying whether the PCA should be parallelized.
This function fits a linear model to the log-expression values for each gene and returns the residuals.
By default, the model is parameterized as a one-way layout with the batch of origin,
so the residuals represent the expression values after correcting for the batch effect.
The novelty of this function is that it returns a ResidualMatrix in as the
This avoids explicitly computing the residuals, which would result in a loss of sparsity or similar problems.
Rather, residuals are either computed as needed or are never explicitly computed at all (e.g., during matrix multiplication).
This means that
regressBatches is faster and lighter than naive regression or even
More complex designs should be explicitly specified with the
design argument, e.g., to regress out a covariate.
This can be any full-column-rank matrix that is typically constructed with
design is specified with a single object in
batch is ignored.
design is specified with multiple objects, regression is applied to the matrix obtained by
cbinding all of those objects together; this means that the first few rows of
design correspond to the cells from the first object, then the next rows correspond to the second object and so on.
rescaleBatches, this function assumes that the batch effect is orthogonal to the interesting factors of variation.
For example, each batch is assumed to have the same composition of cell types.
The same reasoning applies to any uninteresting factors specified in
design, including continuous variables.
For example, if one were to use this function to regress out cell cycle, the assumption is that all cell types are similarly distributed across cell cycle phases.
If this is not true, the correction will not only be incomplete but can introduce spurious differences.
?"batchelor-restrict" for a description of the
Specifically, this function will compute the model coefficients using only the specified subset of cells.
The regression will then be applied to all cells in each batch.
If set, the
d option will perform a PCA via
This is provided for convenience as efficiently executing a PCA on a ResidualMatrix is not always intuitive.
(Specifically, BiocSingularParam objects must be set up with
deferred=TRUE for best performance.)
BPPARAM only have an effect when
d is set to a non-
All genes are used with the default setting of
If a subset of genes is specified, residuals are only returned for that subset.
d is set, only the genes in the subset are used to perform the PCA.
correct.all=TRUE, residuals are returned for all genes but only the subset is used for the PCA.
A SingleCellExperiment object containing the
This contains the computed residuals for each gene (row) in each cell (column) in each batch.
batch field is present in the column data, specifying the batch of origin for each cell.
Cells in the output object are always ordered in the same manner as supplied in
For a single input object, cells will be reported in the same order as they are arranged in that object.
In cases with multiple input objects, the cell identities are simply concatenated from successive objects,
i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.
d is not
NA, a PCA is performed on the residual matrix via
and an additional
corrected field is present in the
reducedDims of the output object.
rescaleBatches, for another approach to regressing out the batch effect.
The ResidualMatrix class, for the class of the residual matrix.
1 2 3 4 5 6 7
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.