librarySizeFactors: Compute library size factors

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Define per-cell size factors from the library sizes (i.e., total sum of counts per cell).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
librarySizeFactors(x, ...)

## S4 method for signature 'ANY'
librarySizeFactors(
  x,
  subset_row = NULL,
  geometric = FALSE,
  pseudo_count = 1,
  BPPARAM = SerialParam()
)

## S4 method for signature 'SummarizedExperiment'
librarySizeFactors(x, exprs_values = "counts", ...)

computeLibraryFactors(x, ...)

Arguments

x

For librarySizeFactors, a numeric matrix of counts with one row per feature and column per cell. Alternatively, a SummarizedExperiment or SingleCellExperiment containing such counts.

For computeLibraryFactors, only a SingleCellExperiment is accepted.

...

For the librarySizeFactors generic, arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY method.

For computeLibraryFactors, further arguments to pass to librarySizeFactors.

subset_row

A vector specifying whether the size factors should be computed from a subset of rows of x.

geometric

Logical scalar indicating whether the size factor should be defined using the geometric mean.

pseudo_count

Numeric scalar specifying the pseudo-count to add during log-transformation when geometric=TRUE.

BPPARAM

A BiocParallelParam object indicating how calculations are to be parallelized. Only relevant when x is a DelayedArray object.

exprs_values

String or integer scalar indicating the assay of x containing the counts.

Details

Library sizes are converted into size factors by scaling them so that their mean across cells is unity. This ensures that the normalized values are still on the same scale as the raw counts. Preserving the scale is useful for interpretation of operations on the normalized values, e.g., the pseudo-count used in logNormCounts can actually be considered an additional read/UMI. This is important for ensuring that the effect of the pseudo-count decreases with increasing sequencing depth.

When using the library size-derived size factor, we implicitly assume that sequencing coverage is the only difference between cells. This is reasonable for homogeneous cell populations but is compromised by composition biases introduced by DE genes between cell types. In such cases, normalization by library size factors will not be entirely correct though the effect on downstream conclusions will vary, e.g., clustering is usually unaffected by composition biases but log-fold change estimates will be less accurate.

A closely related alternative approach involves using the geometric mean of counts within each cell to define the size factor, instead of the library size (which is proportional to the arithmetic mean). This is enabled with geometric=TRUE with addition of pseudo_count to avoid undefined values with zero counts. The geometric mean is more robust to composition biases from upregulated features but is a poor estimator of the relative bias when there are many zero counts, and thus is best suited for deeply sequenced features, e.g., antibody-derived tags.

Value

For librarySizeFactors, a numeric vector of size factors is returned for all methods.

For computeLibraryFactors, a numeric vector is also returned for the ANY and SummarizedExperiment methods. For the SingleCellExperiment method, x is returned containing the size factors in sizeFactors(x).

Author(s)

Aaron Lun

See Also

logNormCounts, where these size factors are used by default.

Examples

1
2
example_sce <- mockSCE()
summary(librarySizeFactors(example_sce))

scater documentation built on Dec. 18, 2019, 2:05 a.m.