View source: R/multiBatchNorm.R

multiBatchNorm | R Documentation |

Perform scaling normalization within each batch to provide comparable results to the lowest-coverage batch.

```
multiBatchNorm(
...,
batch = NULL,
norm.args = list(),
min.mean = 1,
subset.row = NULL,
normalize.all = FALSE,
preserve.single = TRUE,
assay.type = "counts",
BPPARAM = SerialParam()
)
```

`...` |
One or more SingleCellExperiment objects containing counts and size factors. Each object should contain the same number of rows, corresponding to the same genes in the same order. If multiple objects are supplied, each object is assumed to contain all and only cells from a single batch.
If a single object is supplied, Alternatively, one or more lists of SingleCellExperiments can be provided;
this is flattened as if the objects inside were passed directly to |

`batch` |
A factor specifying the batch of origin for all cells when only a single object is supplied in |

`norm.args` |
A named list of further arguments to pass to |

`min.mean` |
A numeric scalar specifying the minimum (library size-adjusted) average count of genes to be used for normalization. |

`subset.row` |
A vector specifying which features to use for normalization. |

`normalize.all` |
A logical scalar indicating whether normalized values should be returned for all genes. |

`preserve.single` |
A logical scalar indicating whether to combine the results into a single matrix if only one object was supplied in |

`assay.type` |
A string specifying which assay values contains the counts. |

`BPPARAM` |
A BiocParallelParam object specifying whether calculations should be parallelized. |

When performing integrative analyses of multiple batches, it is often the case that different batches have large differences in sequencing depth. This function removes systematic differences in coverage across batches to simplify downstream comparisons. It does so by resaling the size factors using median-based normalization on the ratio of the average counts between batches. This is roughly equivalent to the between-cluster normalization described by Lun et al. (2016).

This function will adjust the size factors so that counts in high-coverage batches are scaled *downwards* to match the coverage of the most shallow batch.
The `logNormCounts`

function will then add the same pseudo-count to all batches before log-transformation.
By scaling downwards, we favour stronger squeezing of log-fold changes from the pseudo-count, mitigating any technical differences in variance between batches.
Of course, genuine biological differences will also be shrunk, but this is less of an issue for upregulated genes with large counts.

Only genes with library size-adjusted average counts greater than `min.mean`

will be used for computing the rescaling factors.
This improves precision and avoids problems with discreteness.
By default, we use `min.mean=1`

, which is usually satisfactory but may need to be lowered for very sparse datasets.

Users can also set `subset.row`

to restrict the set of genes used for computing the rescaling factors.
By default, normalized values will only be returned for genes specified in the subset.
Setting `normalize.all=TRUE`

will return normalized values for all genes.

A list of SingleCellExperiment objects with normalized log-expression values in the `"logcounts"`

assay (depending on values in `norm.args`

).
Each object contains cells from a single batch.

If `preserve.single=TRUE`

and `...`

contains only one SingleCellExperiment, that object is returned with an additional `"logcounts"`

assay containing normalized log-expression values.
The order of cells is not changed.

For comparison, imagine if we ran `logNormCounts`

separately in each batch prior to correction.
Size factors will be computed within each batch, and batch-specific application in `logNormCounts`

will not account for scaling differences between batches.
In contrast, `multiBatchNorm`

will rescale the size factors so that they are comparable across batches.
This removes at least one difference between batches to facilitate easier correction.

`cosineNorm`

performs a similar role of equalizing the scale of expression values across batches.
However, the advantage of `multiBatchNorm`

is that its output is more easily interpreted -
the normalized values remain on the log-scale and differences can still be interpreted (roughly) as log-fold changes.
The output can then be fed into downstream analysis procedures (e.g., HVG detection) in the same manner as typical log-normalized values from `logNormCounts`

.

Aaron Lun

Lun ATL (2018). Further MNN algorithm development. https://MarioniLab.github.io/FurtherMNN2018/theory/description.html

`mnnCorrect`

and `fastMNN`

, for methods that can benefit from rescaling.

`logNormCounts`

for the calculation of log-transformed normalized expression values.

`applyMultiSCE`

, to apply this function over the `altExps`

in `x`

.

```
d1 <- matrix(rnbinom(50000, mu=10, size=1), ncol=100)
sce1 <- SingleCellExperiment(list(counts=d1))
sizeFactors(sce1) <- runif(ncol(d1))
d2 <- matrix(rnbinom(20000, mu=50, size=1), ncol=40)
sce2 <- SingleCellExperiment(list(counts=d2))
sizeFactors(sce2) <- runif(ncol(d2))
out <- multiBatchNorm(sce1, sce2)
summary(sizeFactors(out[[1]]))
summary(sizeFactors(out[[2]]))
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.