normalizeCounts: Compute normalized expression values
In LTLA/scuttle: Single-Cell RNA-Seq Analysis Utilities

normalizeCounts

R Documentation

Compute normalized expression values

Description

Compute (log-)normalized expression values by dividing counts for each cell by the corresponding size factor.

Usage

normalizeCounts(x, ...)

## S4 method for signature 'ANY'
normalizeCounts(
  x,
  size.factors = NULL,
  log = NULL,
  transform = c("log", "none", "asinh"),
  pseudo.count = 1,
  center.size.factors = TRUE,
  subset.row = NULL,
  normalize.all = FALSE,
  downsample = FALSE,
  down.target = NULL,
  down.prop = 0.01,
  BPPARAM = SerialParam(),
  size_factors = NULL,
  pseudo_count = NULL,
  center_size_factors = NULL,
  subset_row = NULL,
  down_target = NULL,
  down_prop = NULL
)

## S4 method for signature 'SummarizedExperiment'
normalizeCounts(x, ..., assay.type = "counts", exprs_values = NULL)

## S4 method for signature 'SingleCellExperiment'
normalizeCounts(x, size.factors = sizeFactors(x), ...)

Arguments

`x`	A numeric matrix-like object containing counts for cells in the columns and features in the rows. Alternatively, a SingleCellExperiment or SummarizedExperiment object containing such a count matrix.
`...`	For the generic, arguments to pass to specific methods. For the SummarizedExperiment method, further arguments to pass to the ANY or DelayedMatrix methods. For the SingleCellExperiment method, further arguments to pass to the SummarizedExperiment method.
`size.factors`	A numeric vector of cell-specific size factors. Alternatively `NULL`, in which case the size factors are computed from `x`.
`log`	Logical scalar indicating whether normalized values should be log2-transformed. This is retained for back-compatibility and will override any setting of `transform`. Users should generally use `transform` instead to specify the transformation.
`transform`	String specifying the transformation (if any) to apply to the normalized expression values.
`pseudo.count`	Numeric scalar specifying the pseudo-count to add when `transform="log"`.
`center.size.factors`	Logical scalar indicating whether size factors should be centered at unity before being used.
`subset.row`	A vector specifying the subset of rows of `x` for which to return normalized values. If `size.factors=NULL`, the size factors are also computed from this subset.
`normalize.all`	Logical scalar indicating whether to return normalized values for all genes. If `TRUE`, any non-`NULL` value for `subset.row` is only used to compute the size factors. Ignored if `subset.row=NULL` or `size.factors` is supplied.
`downsample`	Logical scalar indicating whether downsampling should be performed prior to scaling and log-transformation.
`down.target`	Numeric scalar specifying the downsampling target when `downsample=TRUE`. If `NULL`, this is defined by `down.prop` and a warning is emitted.
`down.prop`	Numeric scalar between 0 and 1 indicating the quantile to use to define the downsampling target. Only used when `downsample=TRUE`.
`BPPARAM`	A BiocParallelParam object specifying how library size factor calculations should be parallelized. Only used if `size.factors` is not specified.
`assay.type`	A string or integer scalar specifying the assay of `x` containing the count matrix.
`exprs_values`, `size_factors`, `pseudo_count`, `center_size_factors`, `subset_row`, `down_target`, `down_prop`	Soft-deprecated equivalents to the arguments described previously.

Details

Normalized expression values are computed by dividing the counts for each cell by the size factor for that cell. This removes cell-specific scaling biases due to differences in sequencing coverage, capture efficiency or total RNA content. The assumption is that such biases affect all genes equally (in a scaling manner) and thus can be removed through division by a per-cell size factor.

If transform="log", log-normalized values are calculated by adding pseudo.count to the normalized count and performing a log2-transformation. Differences in values between cells can be interpreted as log-fold changes, which are generally more relevant than differences on the untransformed scale. This provides a suitable input to downstream functions computing, e.g., Euclidean differences, which are effectively an average of the log-fold changes across genes.

Alternatively, if transform="asinh", an inverse hyperbolic transformation is performed. This is commonly used in cytometry and converges to the log2-transformation at high normalized values. (We adjust the scale so that the results are comparable to log2-values, though the actual definition uses natural log.) For non-negative inputs, the main practical difference from a log2-transformation is that there is a bigger gap between transformed values derived from zero and those derived from non-zero inputs.

If the size factors are NULL, they are determined automatically from x. The sum of counts for each cell is used to compute a size factor via the librarySizeFactors function. For the SingleCellExperiment method, size factors are extracted from sizeFactors(x) if available, otherwise they are computed from the assay containing the count matrix.

If subset.row is specified, the output of the function is equivalent to supplying x[subset.row,] in the first place. The exception is if normalize.all=TRUE, in which case subset.row is only used during the size factor calculation; once computed, the size factors are then applied to all genes and the full matrix is returned.

Value

A numeric matrix-like object containing normalized expression values, possibly transformed according to transform. This has the same dimensions as x, unless subset.row is specified and normalize.all=FALSE.

Centering the size factors

If center.size.factors=TRUE, size factors are centred at unity prior to calculation of normalized expression values. This ensures that the computed expression values can be interpreted as being on the same scale as original counts. We can then compare abundances between features normalized with different sets of size factors; the most common use of this fact is in the comparison between spike-in and endogenous abundances when modelling technical noise (see modelGeneVarWithSpikes package for an example).

In the specific case of transform="log", centering of the size factors ensures the pseudo-count can actually be interpreted as a count. This is important as it implies that the pseudo-count's impact will diminish as sequencing coverage improves. Thus, if the size factors are centered, differences between log-normalized expression values will more closely approximate the true log-fold change with increasing coverage, whereas this would not be true of other metrics like log-CPMs with a fixed offset.

The disadvantage of using centered size factors is that the expression values are not directly comparable across different calls to normalizeCounts, typically for multiple batches. In theory, this is not a problem for metrics like the CPM, but in practice, we have to apply batch correction methods anyway to perform any joint analysis - see multiBatchNorm for more details.

Downsampling instead of scaling

If downsample=TRUE, counts for each cell are randomly downsampled instead of being scaled. This is occasionally useful for avoiding artifacts caused by scaling count data with a strong mean-variance relationship. Each cell is downsampled according to the ratio between down.target and that cell's size factor. (Cells with size factors below the target are not downsampled and are directly scaled by this ratio.) Any transformation specified by transform is then applied to the downsampled counts.

We automatically set down.target to the 1st percentile of size factors across all cells involved in the analysis, but this is only appropriate if the resulting expression values are not compared across different normalizeCounts calls. To obtain expression values that are comparable across different normalizeCounts calls (e.g., in modelGeneVarWithSpikes or multiBatchNorm), down_target should be manually set to a constant target value that can be considered a low size factor in every call.

Author(s)

Aaron Lun

Examples

example_sce <- mockSCE()

# Standard scaling + log-transformation:
normed <- normalizeCounts(example_sce)
normed[1:5,1:5]

# Scaling without transformation:
normed <- normalizeCounts(example_sce, log=FALSE)
normed[1:5,1:5]

# Downscaling with transformation:
normed <- normalizeCounts(example_sce, downsample=TRUE)
normed[1:5,1:5]

# Using custom size factors:
with.meds <- computeMedianFactors(example_sce)
normed <- normalizeCounts(with.meds)
normed[1:5,1:5]

LTLA/scuttle documentation built on Oct. 28, 2024, 9:45 a.m.

LTLA/scuttle index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LTLA/scuttle
Single-Cell RNA-Seq Analysis Utilities

normalizeCounts: Compute normalized expression values
In LTLA/scuttle: Single-Cell RNA-Seq Analysis Utilities

Compute normalized expression values

Description

Usage

Arguments

Details

Value

Centering the size factors

Downsampling instead of scaling

Author(s)

See Also

Examples

Related to normalizeCounts in LTLA/scuttle...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/scuttle Single-Cell RNA-Seq Analysis Utilities

normalizeCounts: Compute normalized expression values In LTLA/scuttle: Single-Cell RNA-Seq Analysis Utilities

Compute normalized expression values

Description

Usage

Arguments

Details

Value

Centering the size factors

Downsampling instead of scaling

Author(s)

See Also

Examples

Related to normalizeCounts in LTLA/scuttle...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/scuttle
Single-Cell RNA-Seq Analysis Utilities

normalizeCounts: Compute normalized expression values
In LTLA/scuttle: Single-Cell RNA-Seq Analysis Utilities