Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/batchCorrectedAverages.R
Compute an average statistic for each group in a manner that corrects for batch effects, by fitting a linear model and extracting the coefficients. This handles statistics such as the average log-expression or the proportion of cells with detected expression.
1 2 3 4 5 6 7 |
x |
A numeric matrix containing statistics for each gene (row) and combination of group and block (column),
computed by functions such as |
group |
A factor or vector specifying the group identity for each column of |
block |
A factor or vector specifying the blocking level for each column of |
transform |
String indicating how the differences between groups should be computed, for the batch adjustment. |
offset |
Numeric scalar specifying the offset to use when |
This function considers group-level statistics such as the average expression of all cells or the proportion with detectable expression.
These are helpful for any visualizations that operate on individual groups, e.g., plotGroupedHeatmap
.
However, if groups are distributed across multiple batches, some manner of batch correction is required.
The problem with directly averaging group-level statistics across batches is that some groups may not exist in particular batches,
e.g., due to the presence of unique cell types in different samples.
A direct average would be biased by variable contributions of the batch effect for each group.
To overcome this, we use groups that are present in multiple batches to correct for the batch effect.
(That is, any level of groups
that occurs for multiple levels of block
.)
For each gene, we fit a linear model to the (transformed) values containing both the group and block factors.
We then report the coefficient for each group as the batch-adjusted average for that group;
this is possible as the fitted model has no intercept.
The default of transform="raw"
will not transform the values, and is generally suitable for log-expression values.
Setting transform="log"
will perform a log-transformation after adding offset
, and is suitable for normalized counts.
Setting transform="logit"
will perform a logit transformation after adding offset
to the numerator and denominator (to shrink towards 0.5),
and is suitable for proportional data such as the proportion of detected cells.
After the model is fitted to the transformed values, the reverse transformation is applied to the coefficients to obtain the batch-adjusted average.
For transform="log"
, any negative values are coerced to zero,
while for transform="logit"
, any values outside of [0, 1] are coerced to the closest boundary.
A numeric matrix with number of rows equal to nrow(x)
and number of columns equal to the number of unique levels in group
.
Each column corresponds to a group and contains the averaged statistic across batches.
Aaron Lun
plotGroupedHeatmap
and plotDots
, where this function gets used.
regressBatches
from the batchelor package, to remove the batch effect from per-cell expression values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | y <- matrix(rnorm(10000), ncol=1000)
group <- sample(10, ncol(y), replace=TRUE)
block <- sample(5, ncol(y), replace=TRUE)
library(scuttle)
summaries <- summarizeAssayByGroup(y, DataFrame(group=group, block=block),
statistics=c("mean", "prop.detected"))
# Computing batch-aware averages:
library(scater)
averaged <- batchCorrectedAverages(assay(summaries, "mean"),
group=summaries$group, block=summaries$block)
num <- batchCorrectedAverages(assay(summaries, "prop.detected"),
group=summaries$group, block=summaries$block, transform="logit")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.