The PREP module includes functions to import, merge and normalize RNA and RPF data. The resulting data set will be ready to be passed on to module 3 for quality control or module 4 for analysis of differential translational efficiency.

library(knitr)
opts_chunk$set(tidy.opts=list(width.cutoff=60),tidy=TRUE)
opts_chunk$set(root.dir = "C:/Science/Projects/Ribosome profiling/Ribolog")
setwd("C:/Science/Projects/Ribosome profiling/Ribolog")
getwd()

STEP 1: Create a merged RNA-RPF data set

The steps for creating a merged RNA-RPF data set differs based on whether or not bias correction using the CELP method (module 1) is being performed.

1.1 Without CELP correction

Create the annotation data table:

annotation_human_cDNA <- Ribolog::read_annotation("./data/Human.GRC38.96_annotation.txt")
head(annotation_human_cDNA)

All the bam files (RNA and RPF) can be placed in the same folder and imported together, or imported separately and then merged:

rna_count_LMCN <- Ribolog::bam2count(bamfolder = "./data/Bam/RNA", annotation = annotation_human_cDNA)
rpf_count_LMCN <- Ribolog::bam2count(bamfolder = "./data/Bam/RPF", annotation = annotation_human_cDNA)

rna_rpf_count_LMCN <- merge(rna_count_LMCN, rpf_count_LMCN, by = "transcript")
head(rna_rpf_count_LMCN)

The bam2count function is designed to work on bam files generated by mapping to a transcriptome (not a genome). For each sample, it counts the number of reads mapping to different chromosomes ('seqname's in bam fields). Annotation must be provided to ensure a complete transcript list, including those with zero counts. Low count transcripts can be filtered out later using the min_count_filter function (covered in module 3). If reads were mapped to a genome, counting should be performed outside Ribolog to produce a similar data frame. It can be then passed on to STEP 2.

1.2. With CELP correction

Create the RNA counts data frame using the bam2count function. Then, merge with the corrected RPF counts data frame produced by the codon2transcript function (covered in module 1).

rpf_corrected_sum_LMCN <- readRDS("./data/rpf_corrected_sum_LMCN")
rna_count_LMCN <- Ribolog::bam2count(bamfolder = "./data/Bam/RNA", annotation = annotation_human_cDNA)

rna_CELP_rpf_count_LMCN <- merge(rna_count_LMCN, rpf_corrected_sum_LMCN, by = "transcript")
head(rna_CELP_rpf_count_LMCN)

Note: The corrected RPF data frame may contain slightly fewer transcripts than the output of bam2count. This is because transcripts with CDS length non-divisible by 3 are removed during CELP correction. Only transcripts with data in both data frames are retained after merging.

Note: The RPF counts from bam2count are expected to be higher than even the uncorrected RPF counts produced by the codon2transcript function (module 1). The reason is that reads are filtered based on length and mapping to CDS (eliminating those mapping to 5' UTR or 3' UTR) in the CELP procedure regardless of application of the bias correction itself.

STEP 2: Normalize counts for library size variation

By default, we ue the median-of-ratios method for library size normalization. Users can normalize their data in any other way they prefer outside Ribolog and pass on the result to modules 3 and 4. RNA data and RPF data must be normalized separately. The normalize_median_of_ratios function allows specification of data columns, and thus, indepedent normalization of RPF and RNA columns in a mixed RNA-RPF data set.

# Normalize RNA counts
rna_CELP_rpf_count_norm1_LMCN <- Ribolog::normalize_median_of_ratios(rna_CELP_rpf_count_LMCN, c(2:9))
#Normalize RPF counts
rna_CELP_rpf_count_norm2_LMCN <- Ribolog::normalize_median_of_ratios(rna_CELP_rpf_count_norm1_LMCN, c(10:17))

To be concise, we rename the bias-corrected merged and normalized RNA-RPF data frame rna_CELP_rpf_count_norm2_LMCN to rr_LMCN.

rr_LMCN <- rna_CELP_rpf_count_norm2_LMCN
head(rr_LMCN)
saveRDS(rr_LMCN, "./data/rr_LMCN")

The RNA-RPF merged and normalized data set is now ready for quality control.



goodarzilab/Ribolog documentation built on Oct. 7, 2022, 10:14 p.m.