The PREP module includes functions to import, merge and normalize RNA and RPF data. The resulting data set will be ready to be passed on to module 3 for quality control or module 4 for analysis of differential translational efficiency.
library(knitr) opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
opts_chunk$set(root.dir = "C:/Science/Projects/Ribosome profiling/Ribolog") setwd("C:/Science/Projects/Ribosome profiling/Ribolog") getwd()
The steps for creating a merged RNA-RPF data set differs based on whether or not bias correction using the CELP method (module 1) is being performed.
Create the annotation data table:
annotation_human_cDNA <- Ribolog::read_annotation("./data/Human.GRC38.96_annotation.txt") head(annotation_human_cDNA)
All the bam files (RNA and RPF) can be placed in the same folder and imported together, or imported separately and then merged:
rna_count_LMCN <- Ribolog::bam2count(bamfolder = "./data/Bam/RNA", annotation = annotation_human_cDNA) rpf_count_LMCN <- Ribolog::bam2count(bamfolder = "./data/Bam/RPF", annotation = annotation_human_cDNA) rna_rpf_count_LMCN <- merge(rna_count_LMCN, rpf_count_LMCN, by = "transcript") head(rna_rpf_count_LMCN)
The bam2count
function is designed to work on bam files generated by mapping to a transcriptome (not a genome). For each sample, it counts the number of reads mapping to different chromosomes ('seqname's in bam fields). Annotation must be provided to ensure a complete transcript list, including those with zero counts. Low count transcripts can be filtered out later using the min_count_filter
function (covered in module 3). If reads were mapped to a genome, counting should be performed outside Ribolog to produce a similar data frame. It can be then passed on to STEP 2.
Create the RNA counts data frame using the bam2count
function. Then, merge with the corrected RPF counts data frame produced by the codon2transcript
function (covered in module 1).
rpf_corrected_sum_LMCN <- readRDS("./data/rpf_corrected_sum_LMCN")
rna_count_LMCN <- Ribolog::bam2count(bamfolder = "./data/Bam/RNA", annotation = annotation_human_cDNA) rna_CELP_rpf_count_LMCN <- merge(rna_count_LMCN, rpf_corrected_sum_LMCN, by = "transcript") head(rna_CELP_rpf_count_LMCN)
Note: The corrected RPF data frame may contain slightly fewer transcripts than the output of bam2count
. This is because transcripts with CDS length non-divisible by 3 are removed during CELP correction. Only transcripts with data in both data frames are retained after merging.
Note: The RPF counts from bam2count
are expected to be higher than even the uncorrected RPF counts produced by the codon2transcript
function (module 1). The reason is that reads are filtered based on length and mapping to CDS (eliminating those mapping to 5' UTR or 3' UTR) in the CELP procedure regardless of application of the bias correction itself.
By default, we ue the median-of-ratios method for library size normalization. Users can normalize their data in any other way they prefer outside Ribolog and pass on the result to modules 3 and 4. RNA data and RPF data must be normalized separately. The normalize_median_of_ratios
function allows specification of data columns, and thus, indepedent normalization of RPF and RNA columns in a mixed RNA-RPF data set.
# Normalize RNA counts rna_CELP_rpf_count_norm1_LMCN <- Ribolog::normalize_median_of_ratios(rna_CELP_rpf_count_LMCN, c(2:9)) #Normalize RPF counts rna_CELP_rpf_count_norm2_LMCN <- Ribolog::normalize_median_of_ratios(rna_CELP_rpf_count_norm1_LMCN, c(10:17))
To be concise, we rename the bias-corrected merged and normalized RNA-RPF data frame rna_CELP_rpf_count_norm2_LMCN
to rr_LMCN
.
rr_LMCN <- rna_CELP_rpf_count_norm2_LMCN head(rr_LMCN)
saveRDS(rr_LMCN, "./data/rr_LMCN")
The RNA-RPF merged and normalized data set is now ready for quality control.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.