combineVar: Combine variance decompositions
In scran: Methods for Single-Cell RNA-Seq Data Analysis

Description Usage Arguments Details Value Author(s) See Also Examples

Combine the results of multiple variance decompositions, usually generated for the same genes across separate batches of cells.

combineVar(
  ...,
  method = "fisher",
  pval.field = "p.value",
  other.fields = NULL,
  equiweight = TRUE,
  ncells = NULL
)

combineCV2(
  ...,
  method = "fisher",
  pval.field = "p.value",
  other.fields = NULL,
  equiweight = TRUE,
  ncells = NULL
)

`...`	Two or more DataFrames of variance modelling results. For `combineVar`, these should be produced by `modelGeneVar` or `modelGeneVarWithSpikes`. For `combineCV2`, these should be produced by `modelGeneCV2` or `modelGeneCV2WithSpikes`. Alternatively, one or more lists of DataFrames containing variance modelling results. Mixed inputs are also acceptable, e.g., lists of DataFrames alongside the DataFrames themselves.
`method`	String specifying how p-values are to be combined, see `combinePValues` for options.
`pval.field`	A string specifying the column name of each element of `...` that contains the p-value.
`other.fields`	A character vector specifying the fields containing other statistics to combine.
`equiweight`	Logical scalar indicating whether each result is to be given equal weight in the combined statistics.
`ncells`	Numeric vector containing the number of cells used to generate each element of `...`. Only used if `equiweight=FALSE`.

These functions are designed to merge results from separate calls to modelGeneVar, modelGeneCV2 or related functions, where each result is usually computed for a different batch of cells. Separate variance decompositions are necessary in cases where the mean-variance relationships vary across batches (e.g., different concentrations of spike-in have been added to the cells in each batch), which precludes the use of a common trend fit. By combining these results into a single set of statistics, we can apply standard strategies for feature selection in multi-batch integrated analyses.

By default, statistics in other.fields contain all common non-numeric fields that are not pval.field or "FDR". This usually includes "mean", "total", "bio" (for combineVar) or "ratio" (for combineCV2).

For combineVar, statistics are combined by averaging them across all input DataFrames.
For combineCV2, statistics are combined by taking the geometric mean across all inputs.

This difference between functions reflects the method by which the relevant measure of overdispersion is computed. For example, "bio" is computed by subtraction, so taking the average bio remains consistent with subtraction of the total and technical averages. Similarly, "ratio" is computed by division, so the combined ratio is consistent with division of the geometric means of the total and trend values.

If equiweight=FALSE, each per-batch statistic is weighted by the number of cells used to compute it. The number of cells can be explicitly set using ncells, and is otherwise assumed to be equal for all batches. No weighting is performed by default, which ensures that all batches contribute equally to the combined statistics and avoids situations where batches with many cells dominate the output.

The combinePValues function is used to combine p-values across batches. Only method="z" will perform any weighting of batches, and only if weights is set.

A DataFrame with the same numeric fields as that produced by modelGeneVar or modelGeneCV2. Each row corresponds to an input gene. Each field contains the (weighted) arithmetic/geometric mean across all batches except for p.value, which contains the combined p-value based on method; and FDR, which contains the adjusted p-value using the BH method.

Aaron Lun

modelGeneVar and modelGeneCV2, for two possible inputs into this function.

combinePValues, for details on how the p-values are combined.

library(scuttle)
sce <- mockSCE()

y1 <- sce[,1:100] 
y1 <- logNormCounts(y1) # normalize separately after subsetting.
results1 <- modelGeneVar(y1)

y2 <- sce[,1:100 + 100] 
y2 <- logNormCounts(y2) # normalize separately after subsetting.
results2 <- modelGeneVar(y2)

head(combineVar(results1, results2))
head(combineVar(results1, results2, method="simes"))
head(combineVar(results1, results2, method="berger"))