Normalize: Normalized RNAseq raw read counts

View source: R/preprocess.R

NormalizeR Documentation

Normalized RNAseq raw read counts

Description

RNAseq raw read counts may by normalized based on various parameters including reads per sample, reads mapped per genome, gene length etc. Here we implement the normalization of edgeR (citation) which accounts for differences in both Sequencing Detph and RNA composition (see edgeR documentation page 2.7.2 & 2.7.3). However, in metatranscritpomic studies, it may also be beneficial to adjust for an additional source of compositional bias in which a single organisms may contribute a high relative abundance of transcrtipts, resulting in an undersampling of other organisms. Therefore, we provide an additional normalization step to normalize on a per genome/bin basis.

Usage

Normalize(RNAseq.table, RNAseq.features, normalization.features, simple)

Arguments

RNAseq_Annotated_Matrix

The original count matrix (See X for format details).

no_feature, ambiguous, not_aligned

A set of vectors equal to the number of samples, containing the number of reads that had no feature, where ambiguously mapped, or not aligned in their (obtained from the mapping output).

gene_lengths

A matrix with the length of each gene (genes must be in same order as input RNAseq_Annotated_Matrix)

method

A string containing the method to use, either one of: ["default", "TMM", "RLE"]. In addition to the described default method, TMM and RLE from bioconductors edgeR package are implemented as well

Value

The normalized read counts of Sample 1 ... Sample N.

Note

To remove rows that have a 0 for its read counts:

RNAseq_Annotated_Matrix[apply(RNAseq_Annotated_Matrix[, SS:SE], 1, function(x) !any(x == 0)), ]

Where SS and SE are the start and end columns of the samples (raw counts).

Author(s)

BO Oyserman

Examples

RNAseq_Normalize(RNAseq_Annotated_Matrix, no_feature,ambiguous, not_aligned)

Jorisvansteenbrugge/TcT documentation built on Sept. 26, 2022, 6:50 a.m.