geometricSizeFactors: Compute geometric size factors

geometricSizeFactorsR Documentation

Compute geometric size factors

Description

Define per-cell size factors from the geometric mean of counts per cell.

Usage

geometricSizeFactors(x, ...)

## S4 method for signature 'ANY'
geometricSizeFactors(
  x,
  subset.row = NULL,
  pseudo.count = 1,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
geometricSizeFactors(x, ..., assay.type = "counts")

computeGeometricFactors(x, ...)

Arguments

x

For geometricSizeFactors, a numeric matrix of counts with one row per feature and column per cell. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such counts.

For computeGeometricFactors, only a SingleCellExperiment containing a count matrix is accepted.

...

For the geometricSizeFactors generic, arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method.

For computeGeometricFactors, further arguments to pass to geometricSizeFactors.

subset.row

A vector specifying whether the size factors should be computed from a subset of rows of x.

pseudo.count

Numeric scalar specifying the pseudo-count to add during log-transformation.

BPPARAM

A BiocParallelParam object indicating how calculations are to be parallelized. Only relevant when x is a DelayedArray object.

assay.type

String or integer scalar indicating the assay of x containing the counts.

Details

The geometric mean provides an alternative measure of the average coverage per cell, in contrast to the library size factors (i.e., the arithmetic mean) computed by librarySizeFactors. The main advantage of the geometric mean is that it is more robust to large outliers, due to the slowly increasing nature of the log-transform at large values; in the normalization context, this translates to greater resistance to coposition biases from a few strongly upregulated genes.

On the other hand, the geometric mean is a poor estimator of the relative bias at low or zero counts. This is because the scaling of the coverage applies to the expectation of the raw counts, so the geometric mean only becomes an accurate estimator if the mean of the logs approaches the log of the mean (usually at high counts). The arbitrary pseudo-count also has a bigger influence at low counts.

As such, the geometric mean is only well-suited for deeply sequenced features, e.g., antibody-derived tags.

Value

For geometricSizeFactors, a numeric vector of size factors is returned for all methods.

For computeGeometricFactors, x is returned containing the size factors in sizeFactors(x).

Author(s)

Aaron Lun

See Also

normalizeCounts and logNormCounts, where these size factors are used by default.

geometricSizeFactors and medianSizeFactors, for two other simple methods of computing size factors.

Examples

example_sce <- mockSCE()
summary(geometricSizeFactors(example_sce))

LTLA/scuttle documentation built on Oct. 28, 2024, 9:45 a.m.