View source: R/regressBatches.R

regressBatches | R Documentation |

Fit a linear model to each gene regress out uninteresting factors of variation, returning a matrix of residuals.

```
regressBatches(
...,
batch = NULL,
design = NULL,
keep = NULL,
restrict = NULL,
subset.row = NULL,
correct.all = FALSE,
d = NA,
deferred = TRUE,
assay.type = "logcounts",
BSPARAM = IrlbaParam(),
BPPARAM = SerialParam()
)
```

`...` |
One or more log-expression matrices where genes correspond to rows and cells correspond to columns.
Alternatively, one or more SingleCellExperiment objects can be supplied containing a log-expression matrix in the If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch.
If a single object is supplied, it is assumed to contain cells from all batches, so Alternatively, one or more lists of matrices or SingleCellExperiments can be provided;
this is flattened as if the objects inside each list were passed directly to |

`batch` |
A vector or factor specifying the batch of origin for all cells when only a single object is supplied in |

`design` |
A numeric design matrix with number of rows equal to the total number of cells,
specifying the experimental factors to remove.
Each row corresponds to a cell in the order supplied in |

`keep` |
Integer vector specifying the coefficients of |

`restrict` |
A list of length equal to the number of objects in |

`subset.row` |
A vector specifying which features to use for correction. |

`correct.all` |
Logical scalar indicating whether corrected expression values should be computed for genes not in |

`d` |
Numeric scalar specifying the number of dimensions to use for PCA via |

`deferred` |
Logical scalar indicating whether to defer centering/scaling, see |

`assay.type` |
A string or integer scalar specifying the assay containing the log-expression values. Only used for SingleCellExperiment inputs. |

`BSPARAM` |
A BiocSingularParam object specifying the algorithm to use for PCA in |

`BPPARAM` |
A BiocParallelParam object specifying whether the PCA should be parallelized. |

This function fits a linear model to the log-expression values for each gene and returns the residuals.
By default, the model is parameterized as a one-way layout with the batch of origin,
so the residuals represent the expression values after correcting for the batch effect.
The novelty of this function is that it returns a ResidualMatrix in as the `"corrected"`

assay.
This avoids explicitly computing the residuals, which would result in a loss of sparsity or similar problems.
Rather, residuals are either computed as needed or are never explicitly computed at all (e.g., during matrix multiplication).
This means that `regressBatches`

is faster and lighter than naive regression or even `rescaleBatches`

.

More complex designs should be explicitly specified with the `design`

argument, e.g., to regress out a covariate.
This can be any full-column-rank matrix that is typically constructed with `model.matrix`

.
If `design`

is specified with a single object in `...`

, `batch`

is ignored.
If `design`

is specified with multiple objects, regression is applied to the matrix obtained by `cbind`

ing all of those objects together; this means that the first few rows of `design`

correspond to the cells from the first object, then the next rows correspond to the second object and so on.

Like `rescaleBatches`

, this function assumes that the batch effect is orthogonal to the interesting factors of variation.
For example, each batch is assumed to have the same composition of cell types.
The same reasoning applies to any uninteresting factors specified in `design`

, including continuous variables.
For example, if one were to use this function to regress out cell cycle, the assumption is that all cell types are similarly distributed across cell cycle phases.
If this is not true, the correction will not only be incomplete but can introduce spurious differences.

See `?"batchelor-restrict"`

for a description of the `restrict`

argument.
Specifically, this function will compute the model coefficients using only the specified subset of cells.
The regression will then be applied to all cells in each batch.

If set, the `d`

option will perform a PCA via `multiBatchPCA`

.
This is provided for convenience as efficiently executing a PCA on a ResidualMatrix is not always intuitive.
(Specifically, BiocSingularParam objects must be set up with `deferred=TRUE`

for best performance.)
The arguments `BSPARAM`

, `deferred`

and `BPPARAM`

only have an effect when `d`

is set to a non-`NA`

value.

All genes are used with the default setting of `subset.row=NULL`

.
If a subset of genes is specified, residuals are only returned for that subset.
Similarly, if `d`

is set, only the genes in the subset are used to perform the PCA.
If additionally `correct.all=TRUE`

, residuals are returned for all genes but only the subset is used for the PCA.

A SingleCellExperiment object containing the `corrected`

assay.
This contains the computed residuals for each gene (row) in each cell (column) in each batch.
A `batch`

field is present in the column data, specifying the batch of origin for each cell.

Cells in the output object are always ordered in the same manner as supplied in `...`

.
For a single input object, cells will be reported in the same order as they are arranged in that object.
In cases with multiple input objects, the cell identities are simply concatenated from successive objects,
i.e., all cells from the first object (in their provided order), then all cells from the second object, and so on.

If `d`

is not `NA`

, a PCA is performed on the residual matrix via `multiBatchPCA`

,
and an additional `corrected`

field is present in the `reducedDims`

of the output object.

Aaron Lun

`rescaleBatches`

, for another approach to regressing out the batch effect.

The ResidualMatrix class, for the class of the residual matrix.

`applyMultiSCE`

, to apply this across multiple `altExps`

.

```
means <- 2^rgamma(1000, 2, 1)
A1 <- matrix(rpois(10000, lambda=means), ncol=50) # Batch 1
A2 <- matrix(rpois(10000, lambda=means*runif(1000, 0, 2)), ncol=50) # Batch 2
B1 <- log2(A1 + 1)
B2 <- log2(A2 + 1)
out <- regressBatches(B1, B2)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.