normalization: Normalization methods

Description Usage Arguments Details Author(s) References Examples

Description

Normalization procedures such as RPKM (Mortazavi et al., 2008), Upper Quartile (Bullard et al., 2010) and TMM (Trimmed Mean of M) (Robinson and Oshlack, 2010). These normalization functions are used within the noiseq or noiseqbio functions but may be also used by themselves to normalize a dataset.

Usage

1
2
3
uqua(datos, long = 1000, lc = 0, k = 0)
rpkm(datos, long = 1000, lc = 1, k = 0)
tmm(datos, long = 1000, lc = 0, k = 0, refColumn = 1, logratioTrim = 0.3, sumTrim = 0.05, doWeighting = TRUE, Acutoff = -1e+10)

Arguments

datos

Matrix containing the read counts for each sample.

long

Numeric vector containing the length of the features. If long == 1000, no length correction is applied (no matter the value of parameter lc).

lc

Correction factor for length normalization. This correction is done by dividing the counts vector by (length/1000)^lc. If lc = 0, no length correction is applied. By default, lc = 1 for RPKM and lc = 0 for the other methods.

k

Counts equal to 0 are changed to k in order to avoid indeterminations when applying logarithms, for instance. By default, k = 0.

refColumn

Column to use as reference (only needed for tmm function).

logratioTrim

Amount of trim to use on log-ratios ("M" values) (only needed for tmm function).

sumTrim

Amount of trim to use on the combined absolute levels ("A" values) (only needed for tmm function).

doWeighting

Logical, whether to compute (asymptotic binomial precision) weights (only needed for tmm function).

Acutoff

Cutoff on "A" values to use before trimming (only needed for tmm function).

Details

tmm normalization method was taken from edgeR package (Robinson et al., 2010).

Although Upper Quartile and TMM methods themselves do not correct for the length of the features, these functions in NOISeq allow users to combine the normalization procedures with an additional length correction whenever the length information is available.

Author(s)

Sonia Tarazona

References

Bullard J.H., Purdom E., Hansen K.D. and Dudoit S. (2010) Evaluation of statistical methods for normalization and differential expression in mRNA-seq experiments. BMC Bioinformatics 11(1):94+.

Mortazavi A., Williams B.A., McCue K., Schaeer L. and Wold B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods 5(7):621-628.

Robinson M.D. and Oshlack A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11(3):R25+.

Robinson M.D., McCarthy D.J. and Smyth G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139-140.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Simulate some count data and the features length
datasim = matrix(sample(0:100, 2000, replace = TRUE), ncol = 4)
lengthsim = sample(100:1000, 500)

## RPKM normalization
myrpkm = rpkm(datasim, long = lengthsim, lc = 1, k = 0)

## Upper Quartile normalization, dividing normalized data by the square root of the features length and replacing counts=0 by k=1
myuqua = uqua(datasim, long = lengthsim, lc = 0.5, k = 1)

## TMM normalization with no length correction
mytmm = tmm(datasim, long = 1000, lc = 0, k = 0)

NOISeq documentation built on Nov. 8, 2020, 5:10 p.m.