normalize: Normalization of microarray and RNA-seq expression data

View source: R/normalize.R

normalizeR Documentation

Normalization of microarray and RNA-seq expression data

Description

This function wraps commonly used functionality from limma for microarray normalization and from EDASeq for RNA-seq normalization.

Usage

normalize(
  se,
  norm.method = "quantile",
  data.type = c(NA, "ma", "rseq"),
  filter.by.expr = TRUE
)

Arguments

se

An object of class SummarizedExperiment.

norm.method

Determines how the expression data should be normalized. For available microarray normalization methods see the man page of the limma function normalizeBetweenArrays. For available RNA-seq normalization methods see the man page of the EDASeq function betweenLaneNormalization. For microarray data, defaults to 'quantile', i.e. normalization is carried out so that quantiles between arrays/samples are equal. For RNA-seq data, defaults to 'upper', i.e. normalization is carried out so that quantiles between lanes/samples are equal up to the upper quartile. For RNA-seq data, this can also be 'vst', 'voom', or 'deseq2' to invoke a variance-stabilizing transformation that allows statistical modeling as for microarry data. See details.

data.type

Expression data type. Use 'ma' for microarray and 'rseq' for RNA-seq data. If NA, the data type is automatically guessed: if the expression values in se are decimal (float) numbers, they are assumed to be microarray intensities; whole (integer) numbers are assumed to be RNA-seq read counts. Defaults to NA.

filter.by.expr

Logical. For RNA-seq data: include only genes with sufficiently large counts in the DE analysis? If TRUE, excludes genes not satisfying a minimum number of read counts across samples using the filterByExpr function from the edgeR package. Defaults to TRUE.

Details

Normalization of high-throughput expression data is essential to make results within and between experiments comparable. Microarray (intensity measurements) and RNA-seq (read counts) data exhibit typically distinct features that need to be normalized for. For specific needs that deviate from standard normalizations, the user should always refer to more specific functions/packages. See also the limma's user guide http://www.bioconductor.org/packages/limma for definition and normalization of the different expression data types.

Microarray data is expected to be single-channel. For two-color arrays, it is expected that normalization within arrays has been already carried out, e.g. using normalizeWithinArrays from limma.

RNA-seq data is expected to be raw read counts. Please note that normalization for downstream DE analysis, e.g. with edgeR and DESeq2, is not ultimately necessary (and in some cases even discouraged) as many of these tools implement specific normalization approaches. See the vignette of EDASeq, edgeR, and DESeq2 for details.

Using norm.method = "vst" invokes a variance-stabilizing transformation (VST) for RNA-seq read count data. This accounts for differences in sequencing depth between samples and over-dispersion of read count data. The VST uses the cpm function implemented in the edgeR package to compute moderated log2 read counts. Using edgeR's estimate of the common dispersion phi, the prior.count parameter of the cpm function is chosen as 0.5 / phi as previously suggested (Harrison, 2015).

Value

An object of class SummarizedExperiment.

Author(s)

Ludwig Geistlinger

References

Harrison (2015) Anscombe's 1948 variance stabilizing transformation for the negative binomial distribution is well suited to RNA-seq expression data. doi:10.7490/f1000research.1110757.1

Anscombe (1948) The transformation of Poisson, binomial and negative-binomial data. Biometrika 35(3-4):246-54.

Law et al. (2014) voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biol 15:29.

See Also

readSE for reading expression data from file;

normalizeWithinArrays and normalizeBetweenArrays for normalization of microarray data;

withinLaneNormalization and betweenLaneNormalization from the EDASeq package for normalization of RNA-seq data;

cpm, estimateDisp, voom, and varianceStabilizingTransformation from the DESeq2 package.

Examples


    #
    # (1) simulating expression data: 100 genes, 12 samples
    #
    
    # (a) microarray data: intensity measurements
    maSE <- makeExampleData(what="SE", type="ma")
    
    # (b) RNA-seq data: read counts
    rseqSE <- makeExampleData(what="SE", type="rseq")

    #
    # (2) Normalization
    #
    
    # (a) microarray ... 
    maSE <- normalize(maSE) 
    assay(maSE, "raw")[1:5,1:5] 
    assay(maSE, "norm")[1:5,1:5] 

    # (b) RNA-seq ... 
    normSE <- normalize(rseqSE, norm.method = "vst") 
    assay(maSE, "raw")[1:5,1:5] 
    assay(maSE, "norm")[1:5,1:5] 


lgeistlinger/EnrichmentBrowser documentation built on May 9, 2024, 7:22 p.m.