Methods for Function withinLaneNormalization in Package EDASeq

Share:

Description

Within-lane normalization for GC-content (or other lane-specific) bias.

Usage

1
withinLaneNormalization(x, y, which=c("loess","median","upper","full"), offset=FALSE, num.bins=10, round=TRUE)

Arguments

x

A numeric matrix representing the counts or a SeqExpressionSet object.

y

A numeric vector representing the covariate to normalize for (if x is a matrix) or a character vector with the name of the covariate (if x is a SeqExpressionSet object). Usually it is the GC-content.

which

Method used to normalized. See the details section and the reference below for details.

offset

Should the normalized value be returned as an offset leaving the original counts unchanged?

num.bins

The number of bins used to stratify the covariate for median, upper and full methods. Ignored if loess. See the reference for a discussion on the number of bins.

round

If TRUE the normalization returns rounded values (pseudo-counts). Ignored if offset=TRUE.

Details

This method implements four normalizations described in Risso et al. (2011).

The loess normalization transforms the data by regressing the counts on y and subtracting the loess fit from the counts to remove the dependence.

The median, upper and full normalizations are based on the stratification of the genes based on y. Once the genes are stratified in num.bins strata, the methods work as follows.

median:

scales the data to have the same median in each bin.

upper:

the same but with the upper quartile.

full:

forces the distribution of each stratum to be the same using a non linear full quantile normalization, in the spirit of the one used in microarrays.

Methods

signature(x = "matrix", y = "numeric")

It returns a matrix with the normalized counts if offset=FALSE or with the offset if offset=TRUE.

signature(x = "SeqExpressionSet", y = "character")

It returns a SeqExpressionSet with the normalized counts in the normalizedCounts slot and with the offset in the offset slot (if offset=TRUE).

Author(s)

Davide Risso.

References

D. Risso, K. Schwartz, G. Sherlock and S. Dudoit (2011). GC-Content Normalization for RNA-Seq Data. Manuscript in Preparation.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library(yeastRNASeq)
data(geneLevelData)
data(yeastGC)

sub <- intersect(rownames(geneLevelData), names(yeastGC))

mat <- as.matrix(geneLevelData[sub, ])

data <- newSeqExpressionSet(mat,
                            phenoData=AnnotatedDataFrame(
                                      data.frame(conditions=factor(c("mut", "mut", "wt", "wt")),
                                                 row.names=colnames(geneLevelData))),
                            featureData=AnnotatedDataFrame(data.frame(gc=yeastGC[sub])))

norm <- withinLaneNormalization(data, "gc", which="full", offset=FALSE)