normOffsets: Normalize trended biases across libraries
In LTLA/csaw: ChIP-Seq Analysis with Windows

normOffsets

R Documentation

Normalize trended biases across libraries

Description

Calculate normalization offsets by performing a loess fit to count data from multiple libraries.

Usage

normOffsets(object, ..., assay.id = "counts", se.out = TRUE)

Arguments

`object`	A SummarizedExperiment object containing a count matrix and library sizes in the `totals` field of the `colData`. Alternatively, a DGEList object containing a count matrix in `object$counts` and library sizes in `object$samples$lib.size`. Alternatively, an ordinary matrix containing counts.
`...`	Other arguments to be passed to `loessFit`.
`assay.id`	An integer scalar or string specifying the assay values to use for normalization.
`se.out`	A logical scalar indicating whether or not a SummarizedExperiment object should be returned. Alternatively, a SummarizedExperiment or DGEList object in which normalization factors are to be stored.

Details

This function performs non-linear normalization similar to the fast loess algorithm in normalizeCyclicLoess. The aims is to account for mean dependencies in the efficiency biases between libraries. For each sample, a lowess curve is fitted to the log-counts against the log-average count. The fitted value for each genomic window is used as an offset in a generalized linear model for that feature and sample. The use of the average count provides more stability than the average log-count when low counts are present for differentially bound regions.

The trend fits are always computed from object. However, if se.out is a (different) SummarizedExperiment or DGEList object, the trend fits will be used to compute offsets for each entry in se.out using spline interpolation. This is useful when se.out contains counts for windows in an endogenous genome, but the trend fits are computed using spike-in chromatin regions.

An error is raised if the library sizes in se.out$totals are not identical to object$totals. This is because the average abundances are only comparable when the library sizes are the same. Consistent library sizes can be achieved by using the same readParam object in windowCounts and related functions.

Value

If se.out=FALSE, a numeric matrix of dimensions equal to object, containing the offset for each observation. These offsets have already been scaled to be comparable in magnitude to the log-library sizes.

If se.out=TRUE, the same matrix is stored in the offset assay of object (if it is a SummarizedExperiment) or object$offset (if a DGEList) and the modified object is returned.

If se.out is a separate SummarizedExperiment or DGEList object, the offset matrix instead has dimensions equal to se.out. This matrix is stored inside se.out and the modified object is returned.

Author(s)

Aaron Lun

References

Ballman KV, Grill DE, Oberg AL, Therneau TM (2004). Faster cyclic loess: normalizing RNA arrays via linear models. Bioinformatics 20, 2778-86.

Examples

counts <- matrix(rnbinom(400, mu=10, size=20), ncol=4)
data <- SummarizedExperiment(list(counts=counts))
data$totals <- colSums(counts)

# TMM normalization.
normFactors(data)

# Using loess-based normalization, instead.
offsets <- normOffsets(data)
head(offsets)
offsets <- normOffsets(data, span=0.4)
offsets <- normOffsets(data, iterations=1)

LTLA/csaw documentation built on Jan. 30, 2025, 8:21 p.m.