It is possible to compute the correction using only a subset of cells in each batch, and then extrapolate that correction to all other cells. This may be desirable in experimental designs where a control set of cells from the same source population were run on different batches. Any difference in the controls must be artificial in origin and can be directly removed without making further biological assumptions. Similarly, if certain cells are known to be of a batch-specific subpopulation, it may be desirable to exclude them to ensure that they are not inadvertently used during the batch correction.
To perform restriction, users should set
restrict to specify the subset of cells in each batch to be used for correction.
This should be set to a list of length equal to the number of objects passed to the
... argument of the batch correction function.
Each element of this list should be a subsetting vector to be applied to the columns of the corresponding batch.
NULL element indicates that all the cells from a batch should be used.
In situations where one input object contains multiple batches,
restrict should simply a list containing a single subsetting vector for that object.
Correction functions that support
restrict will only use the restricted subset of cells in each batch to perform the correction.
fastMNN will only use the restricted cells to identify MNN pairs and the center of the orthogonalization.
However, it will apply the correction to all cells in each batch - hence the extrapolation.
This means that the output is always of the same dimensionality, regardless of whether
restrict is specified.
As a general rule, users can expect the corrected values in the restricted cells to be the same as if the inputs were directly subsetted to only contain those cells (see Examples).
This is appealing as it demonstrates that correction only uses information from the restricted subset of cells.
If batch correction functions do not follow this rule, they will explicitly state so, e.g., in
1 2 3 4 5 6 7 8 9 10 11 12
means <- 2^rgamma(1000, 2, 1) A1 <- matrix(rpois(10000, lambda=means), ncol=50) # Batch 1 A2 <- matrix(rpois(10000, lambda=means*runif(1000, 0, 2)), ncol=50) # Batch 2 B1 <- log2(A1 + 1) B2 <- log2(A2 + 1) out <- regressBatches(B1, B2, restrict=list(1:10, 1:10)) assay(out)[,c(1:10, 50+1:10)] # Compare to actual subsetting: out.sub <- regressBatches(B1[,1:10], B2[,1:10]) assay(out.sub)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.