cosineNorm: Cosine normalization

View source: R/cosineNorm.R

cosineNormR Documentation

Cosine normalization

Description

Perform cosine normalization on the column vectors of an expression matrix.

Usage

cosineNorm(
  x,
  mode = c("matrix", "all", "l2norm"),
  subset.row = NULL,
  BPPARAM = SerialParam()
)

Arguments

x

A gene expression matrix with cells as columns and genes as rows.

mode

A string specifying the output to be returned.

subset.row

A vector specifying which features to use to compute the L2 norm.

BPPARAM

A BiocParallelParam object specifying how parallelization is to be performed. Only used when x is a DelayedArray object.

Details

Cosine normalization removes scaling differences between expression vectors. In the context of batch correction, this is usually applied to remove differences between batches that are normalized separately. For example, fastMNN uses this function on the log-expression vectors by default.

Technically, separate normalization introduces scaling differences in the normalized expression, which should manifest as a shift in the log-transformed expression. However, in practice, single-cell data will contain many small counts (where the log function is near-linear) or many zeroes (which remain zero when the pseudo-count is 1). In these applications, scaling differences due to separate normalization are better represented as scaling differences in the log-transformed values.

If applied to the raw count vectors, cosine normalization is similar to library size-related (i.e., L1) normalization. However, we recommend using dedicated methods for computing size factors to normalize raw count data.

While the default is to directly return the cosine-normalized matrix, it may occasionally be desirable to obtain the L2 norm, e.g., to apply an equivalent normalization to other matrices. This can be achieved by setting mode accordingly.

The function will return a DelayedMatrix if x is a DelayedMatrix. This aims to delay the calculation of cosine-normalized values for very large matrices.

Value

If mode="matrix", a double-precision matrix of the same dimensions as X is returned, containing cosine-normalized values.

If mode="l2norm", a double-precision vector is returned containing the L2 norm for each cell.

If mode="all", a named list is returned containing the fields "matrix" and "l2norm", which are as described above.

Author(s)

Aaron Lun

See Also

mnnCorrect and fastMNN, where this function gets used.

Examples

A <- matrix(rnorm(1000), nrow=10)
str(cosineNorm(A))
str(cosineNorm(A, mode="l2norm"))


LTLA/batchelor documentation built on Jan. 19, 2024, 6:33 p.m.