Normalize | R Documentation |
RNAseq raw read counts may by normalized based on various parameters including reads per sample, reads mapped per genome, gene length etc. Here we implement the normalization of edgeR (citation) which accounts for differences in both Sequencing Detph and RNA composition (see edgeR documentation page 2.7.2 & 2.7.3). However, in metatranscritpomic studies, it may also be beneficial to adjust for an additional source of compositional bias in which a single organisms may contribute a high relative abundance of transcrtipts, resulting in an undersampling of other organisms. Therefore, we provide an additional normalization step to normalize on a per genome/bin basis.
Normalize(RNAseq.table, RNAseq.features, normalization.features, simple)
RNAseq_Annotated_Matrix |
The original count matrix (See X for format details). |
no_feature, ambiguous, not_aligned |
A set of vectors equal to the number of samples, containing the number of reads that had no feature, where ambiguously mapped, or not aligned in their (obtained from the mapping output). |
gene_lengths |
A matrix with the length of each gene (genes must be in same order as input RNAseq_Annotated_Matrix) |
method |
A string containing the method to use, either one of: ["default", "TMM", "RLE"]. In addition to the described default method, TMM and RLE from bioconductors edgeR package are implemented as well |
The normalized read counts of Sample 1
... Sample N
.
To remove rows that have a 0 for its read counts:
RNAseq_Annotated_Matrix[apply(RNAseq_Annotated_Matrix[, SS:SE], 1, function(x) !any(x == 0)), ]
Where SS and SE are the start and end columns of the samples (raw counts).
BO Oyserman
RNAseq_Normalize(RNAseq_Annotated_Matrix, no_feature,ambiguous, not_aligned)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.