normalize_data: Normalize data

Description Usage Arguments Details Value Author(s) References Examples

Description

Normalizes the data using the specified normalization function

Usage

1
normalize_data(omicsData, norm_fn, normalize = FALSE, ...)

Arguments

omicsData

an object of the class 'seqData' created by as.seqData.

norm_fn

character vector indicating the normalization function to use for normalization. See details for valid options.

normalize

For count data, this function will only return the scale and location parameters - will not return normalized data unless this parameter is set to TRUE. This is due to later statistics requiring raw data for count data analyses. Default is FALSE.

...

additional arguments passed to the chosen normalization function.

Details

For count data (16S data), the default normalization is Cumulative Sum Scaling norm_fn="css". The choices for normalization currently available, norm_fn, are:

"percentile" Standardize the data by dividing each feature in e_data by the sample-wide qth percentile and multiply by the gloabal qth percentile
"tss" Standardize the data by dividing each feature in e_data by sum of the sample-wide counts
"rarefy" Normalize the data by subsampling down to specified library size
"poisson" Normalize the data by sampling from a Poisson distribution with appropriate mean value, see Section 2.2 of Li et al. (2013)
"deseq" Normalization method used in DESeq and DESeq2, which uses size factors to standardize sequencing depths across samples
"css" Normalize the data by dividing each feature in e_data by the sum of the counts up to a specified quantile and multiplying by a global scaling factor
"tmm" Normalize the data using the trimmed mean of M values
"log" Normalize data using a log2 transformation
"clr" Normalize data using centered-log ratio
"none" No normalization is performed

Value

If normalize=FALSE, a list containing the location and scale parameters to use when normalizing the data. If normalize=TRUE, returns the omicsData object, where e_data has been normalized with the appropriate parameters and the scale and location parameters are returned as an attribute of the data.

Author(s)

Kelly Stratton, Lisa Bramer, Bryan Stanfill, Allison Thompson

References

Li, Jun, and Robert Tibshirani. Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data. Statistical methods in medical research 22.5 (2013): 519-536. Anders, Simon and Wolfgang Huber. Differential expression analysis for sequence count data. Genome Biology 11:R106 (2010). Paulson, Joseph N, O Colin Stine, Hector Corrada Bravo, and Mihai Pop. Differential abundance analysis for microbial marker-gene surveys. Nature Methods. 10.12 (2013) Robinson, Mark D and Alicia Oshlack. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology 11:R25 (2010).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
## Not run: 
library(mintJansson)

#Count data are passed to a quantile normalization by default
normalized_rRNAdata <- normalize_data(rRNA_data)
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = 'percentile')

#One can also use the TSS normalization for count data, the normalization function used by DESeq/DESeq2, normalization from SAMSeq (aka Poisson), another from metagenomeSeq (cumulative sum scaling normalization (CSS)), or yet another competitor edgeR (TMM normalization)
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = "tss")
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = "deseq")
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = "poisson")
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = "css")
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = "tmm")

#One could also rarefy the data - though this is highly NOT recommended
normalized_rRNAdata <- normalize_data(rRNA_data, norm_fn = "rarefy")

## End(Not run)

pmartR/pmartRseq documentation built on May 25, 2019, 9:20 a.m.