Normalization of microarray and RNA-seq expression data

Share:

Description

This function wraps commonly used functionality from limma for microarray normalization and from EDASeq for RNA-seq normalization.

Usage

1
2
    normalize( eset, 
        norm.method = "quantile", within = FALSE, data.type = c(NA, "ma", "rseq") )

Arguments

eset

Expression set. An object of ExpressionSet-class. See the man page of read.eset for prerequisites for the expression data.

norm.method

Determines how the expression data should be normalized. For available microarray normalization methods see the man page of the limma function normalizeBetweenArrays. For available RNA-seq normalization methods see the man page of the EDASeq function betweenLaneNormalization. Defaults to 'quantile', i.e. normalization is carried out so that quantiles between arrays/lanes/samples are equal. See details.

within

Logical. Is only taken into account if data.type='rseq'. Determine whether GC content normalization should be carried out (as implemented in the EDASeq function withinLaneNormalization). Defaults to FALSE. See details.

data.type

Expression data type. Use 'ma' for microarray and 'rseq' for RNA-seq data. If NA, data.type is automatically guessed. If the expression values in 'eset' are decimal numbers they are assumed to be microarray intensities. Whole numbers are assumed to be RNA-seq read counts. Defaults to NA.

Details

Normalization of high-throughput expression data is essential to make results within and between experiments comparable. Microarray (intensity measurements) and RNA-seq (read counts) data exhibit typically distinct features that need to be normalized for. For specific needs that deviate from these standard normalizations, the user should always refer to more specific functions/packages.

Microarray data is expected to be single-channel. For two-color arrays, it is expected here that normalization within arrays has been already carried out, e.g. using normalizeWithinArrays from limma.

RNA-seq data is expected to be raw read counts. Please note that normalization for downstream DE analysis, e.g. with edgeR and DESeq, is not ultimately necessary (and in some cases even discouraged) as many of these tools implement specific normalization approaches. See the vignette of EDASeq, edgeR, and DESeq for details.

Value

An object of ExpressionSet-class. For RNA-seq data, an object of SeqExpressionSet-class to conform with downstream DE analysis.

Author(s)

Ludwig Geistlinger <Ludwig.Geistlinger@bio.ifi.lmu.de>

See Also

read.eset describes prerequisites for the expression data;

normalizeWithinArrays and normalizeBetweenArrays for normalization of microarray data;

withinLaneNormalization and betweenLaneNormalization for normalization of RNA-seq data.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
    #
    # (1) simulating expression data: 100 genes, 12 samples
    #
    
    # (a) microarray data: intensity measurements
    ma.eset <- make.example.data(what="eset", type="ma")
    
    # (b) RNA-seq data: read counts
    rseq.eset <- make.example.data(what="eset", type="rseq")

    #
    # (2) Normalization
    #
    
    # (a) microarray ... 
    norm.eset <- normalize(ma.eset) 

    # (b) RNA-seq ... 
    norm.eset <- normalize(rseq.eset) 

    # ... normalize also for GC content
    gc.content <- rnorm(100, 0.5, sd=0.1)
    fData(rseq.eset)$gc <- gc.content 

    norm.eset <- normalize(rseq.eset, within=TRUE)