Description Usage Arguments Details Value Author(s) See Also Examples

Define per-cell size factors from the library sizes (i.e., total sum of counts per cell).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | ```
librarySizeFactors(x, ...)
## S4 method for signature 'ANY'
librarySizeFactors(
x,
subset_row = NULL,
geometric = FALSE,
pseudo_count = 1,
BPPARAM = SerialParam()
)
## S4 method for signature 'SummarizedExperiment'
librarySizeFactors(x, exprs_values = "counts", ...)
computeLibraryFactors(x, ...)
``` |

`x` |
For For |

`...` |
For the For |

`subset_row` |
A vector specifying whether the size factors should be computed from a subset of rows of |

`geometric` |
Logical scalar indicating whether the size factor should be defined using the geometric mean. |

`pseudo_count` |
Numeric scalar specifying the pseudo-count to add during log-transformation when |

`BPPARAM` |
A BiocParallelParam object indicating how calculations are to be parallelized.
Only relevant when |

`exprs_values` |
String or integer scalar indicating the assay of |

Library sizes are converted into size factors by scaling them so that their mean across cells is unity.
This ensures that the normalized values are still on the same scale as the raw counts.
Preserving the scale is useful for interpretation of operations on the normalized values,
e.g., the pseudo-count used in `logNormCounts`

can actually be considered an additional read/UMI.
This is important for ensuring that the effect of the pseudo-count decreases with increasing sequencing depth.

When using the library size-derived size factor, we implicitly assume that sequencing coverage is the only difference between cells. This is reasonable for homogeneous cell populations but is compromised by composition biases introduced by DE genes between cell types. In such cases, normalization by library size factors will not be entirely correct though the effect on downstream conclusions will vary, e.g., clustering is usually unaffected by composition biases but log-fold change estimates will be less accurate.

A closely related alternative approach involves using the geometric mean of counts within each cell to define the size factor,
instead of the library size (which is proportional to the arithmetic mean).
This is enabled with `geometric=TRUE`

with addition of `pseudo_count`

to avoid undefined values with zero counts.
The geometric mean is more robust to composition biases from upregulated features but is a poor estimator of the relative bias when there are many zero counts, and thus is best suited for deeply sequenced features, e.g., antibody-derived tags.

For `librarySizeFactors`

, a numeric vector of size factors is returned for all methods.

For `computeLibraryFactors`

, a numeric vector is also returned for the ANY and SummarizedExperiment methods.
For the SingleCellExperiment method, `x`

is returned containing the size factors in `sizeFactors(x)`

.

Aaron Lun

`logNormCounts`

, where these size factors are used by default.

1 2 | ```
example_sce <- mockSCE()
summary(librarySizeFactors(example_sce))
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.