The exomePeak2 user's guide
In exomePeak2: Bias Awared Peak Calling and Quantification for MeRIP-Seq

knitr::opts_chunk$set(dev="png",fig.show="hold",
               fig.width=8,fig.height=4.5,fig.align="center",
               message=FALSE,collapse=TRUE)

Introduction

exomePeak2 provides bias awared quantification and peak detection on Methylated RNA immunoprecipitation sequencing data (MeRIP-Seq). MeRIP-Seq is a commonly applied sequencing technology to measure the transcriptome-wide location and abundance of RNA modification sites under a given cellular condition. However, the quantification and peak calling in MeRIP-Seq are sensitive to PCR amplification bias which is prevalent in next generation sequencing (NGS) techniques. In addition, the RNA-Seq based count data exhibits biological variation in small reads count. exomePeak2 collectively address these challanges by introducing a rich set of robust data science models tailored for MeRIP-Seq. With exomePeak2, users can perform peak calling, modification site quantification, and differential analysis with a straightforward one-step function. Alternatively, users could customize methods for their own analysis through multi-step functions and diagnostic plots.

Installation

To install exomePeak2 from bioconductor, start R (version >"3.6") and enter:

if(!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("exomePeak2")

For order versions of R, please refer to the appropriate [Bioconductor release](https://www.bioconductor.org/about/release-announcements/}.

Peak Calling

For peak calling of MeRIP-Seq experiment, exomePeak2 demands the reads alignment results in BAM files. Users can specify the biological replicates of the IP and input samples by a character vector of the corresponding BAM directories at the arguments bam_ip and bam_input separately.

In the following example, the transcript annotation is provided using GFF files. Transcript annotation can also be provided by the TxDb object. exomePeak2 will automatically download the TxDb if the genome argument is filled with the corresponding UCSC genome name.

The genome sequence is required to conduct GC content bias correction. If the genome argument is missing ( = NULL ), exomPeak2 will perform peak calling without correcting the GC content bias.

library(exomePeak2)

set.seed(1)

GENE_ANNO_GTF = system.file("extdata", "example.gtf", package="exomePeak2")

f1 = system.file("extdata", "IP1.bam", package="exomePeak2")
f2 = system.file("extdata", "IP2.bam", package="exomePeak2")
f3 = system.file("extdata", "IP3.bam", package="exomePeak2")
f4 = system.file("extdata", "IP4.bam", package="exomePeak2")
IP_BAM = c(f1,f2,f3,f4)

f1 = system.file("extdata", "Input1.bam", package="exomePeak2")
f2 = system.file("extdata", "Input2.bam", package="exomePeak2")
f3 = system.file("extdata", "Input3.bam", package="exomePeak2")
INPUT_BAM = c(f1,f2,f3)

exomePeak2(bam_ip = IP_BAM,
           bam_input = INPUT_BAM,
           gff_dir = GENE_ANNO_GTF,
           genome = "hg19",
           paired_end = FALSE)

exomePeak2 will export the modification peaks in formats of BED file and CSV table, the data will be saved automatically under a folder named by exomePeak2_output.

The modification peak statistics are derived from the ${\beta}_{i,1}$ terms in the following linear regression design.

$$log2(Q_{i,j}) = {\beta}{i,0} + {\beta}{i,1}I(\rho(j)=IP) + t_{i,j}$$

$Q_{i,j}$ is the expected value of reads abundence of modification $i$ under sample $j$. ${\beta}{i,0}$ is the intercept coefficient, ${\beta}{i,1}$ is the coefficient for IP/input log2 fold change, $I(\rho(j)=IP)$ is the regression covariate that is the indicator variable for the sample $j$ being IP sample. $t_{i,j}$ is the regression offset that account for the sequencing depth variation and the GC content biases.

Under the default settings, the linear models fitted are the regularized GLM (Generalized Linear Model) of NB developed by DESeq2. If one of the IP and input group has no biological replicates, Poisson GLMs will be fitted to the modification peaks.

Explaination over the columns of the exported table:

chr: the chromosomal name of the peak.
chromStart: the start of the peak on the chromosome.
chromEnd: the end of the peak on the chromosome.
name: the unique ID of the modification peak.
score: the -log2 p value of the peak.
strand: the strand of the peak on genome.
thickStart: the start position of the peak.
thickEnd: the end position of the peak.
itemRgb: the column for the RGB encoded color in BED file visualization.
blockCount: the block (exon) number within the peak.
blockSizes: the widths of blocks.
blockStarts: the start positions of blocks.
geneID: the gene ID of the peak.
ReadsCount.input: the reads count of the input sample.
ReadsCount.IP: the reads count of the IP sample.
log2FoldChange: the estimates of IP over input log2 fold enrichment (coefficient estimates of ${\beta}_{i,1}$).
pvalue: the Wald test p value on the modification coefficient.
padj: the adjusted Wald test p value using BH approach.

Differential Modification Analysis

The code below could conduct differential modification analysis (Comparison of Two Conditions) on exon regions defined by the transcript annotation.

In differential modification mode, exomePeak2 will first perform Peak calling on exon regions using both the control and treated samples. Then, it will conduct the differential modification analysis on peaks reported from peak calling using an interactive GLM.

f1 = system.file("extdata", "treated_IP1.bam", package="exomePeak2")
TREATED_IP_BAM = c(f1)
f1 = system.file("extdata", "treated_Input1.bam", package="exomePeak2")
TREATED_INPUT_BAM = c(f1)

exomePeak2(bam_ip = IP_BAM,
           bam_input = INPUT_BAM,
           bam_treated_input = TREATED_INPUT_BAM,
           bam_treated_ip = TREATED_IP_BAM,
           gff_dir = GENE_ANNO_GTF,
           genome = "hg19",
           paired_end = FALSE)

In differential modification mode, exomePeak2 will export the differential modification peaks in formats of BED file and CSV table, the data will also be saved automatically under a folder named by exomePeak2_output.

The peak statistics in differential modification setting are derived from the interactive coefficient ${\beta}_{i,3}$ in the following regression design of the NB GLM:

$$log2(Q_{i,j}) = {\beta}{i,0} + {\beta}{i,1}I(\rho(j)=IP) + {\beta}{i,2}I(\rho(j)=Treatment) + {\beta}{i,3}I(\rho(j)=IP\&Treatment) + t_{i,j}$$

Explaination for the additional table columns:

ModLog2FC_control: the modification log2 fold enrichment in the control condition.
ModLog2FC_treated: the modification log2 fold enrichment in the treatment condition.
DiffModLog2FC: the log2 Fold Change estimates of differential modification (coefficient estimates of ${\beta}_{i,3}$).
pvalue: the Wald test p value on the differential modification coefficient.
padj: the adjusted Wald test p value using BH approach.

Quantification and Statistical Analysis with Single Based Modification Annotation

exomePeak2 supports the modification quantification and differential modification analysis on single based modification annotation. The modification sites with single based resolution can provide a more accurate mapping of modification locations compared with the peaks called directly from the MeRIP-seq datasets.

Some of the datasets in epitranscriptomics have a single based resolution, e.x. Data generated by the m6A-CLIP-Seq or m6A-miCLIP-Seq techniques. Reads count on the single based modification sites could provide a more accurate and consistent quantification on MeRIP-Seq experiments due to the elimination of the technical variation introduced by the feature lengths.

exomePeak2 will automatically initiate the mode of single based modification quantification by providing a sigle based annotation file under the argument mod_annot.

The single based annotation information should be provided to the exomePeak2 function in the format of a GRanges object.

f2 = system.file("extdata", "mod_annot.rds", package="exomePeak2")

MOD_ANNO_GRANGE <- readRDS(f2)

exomePeak2(bam_ip = IP_BAM,
           bam_input = INPUT_BAM,
           gff_dir = GENE_ANNO_GTF,
           genome = "hg19",
           paired_end = FALSE,
           mod_annot = MOD_ANNO_GRANGE)

In this mode, exomePeak2 will export the analysis result also in formats of BED file and CSV table, while each row of the table corresponds to the sites of the annotation GRanges.

Peak Calling and Visualization in Multiple Steps

The exomePeak2 package can achieve peak calling and peak statistics calulation with multiple functions.

1. Check the bam files of MeRIP-seq data before peak calling.

MeRIP_Seq_Alignment <- scanMeripBAM(
                         bam_ip = IP_BAM,
                         bam_input = INPUT_BAM,
                         paired_end = FALSE
                        )

For MeRIP-seq experiment with interactive design (contain control and treatment groups), use the following code.

MeRIP_Seq_Alignment <- scanMeripBAM(
    bam_ip = IP_BAM,
    bam_input = INPUT_BAM,
    bam_treated_input = TREATED_INPUT_BAM,
    bam_treated_ip = TREATED_IP_BAM,
    paired_end = FALSE
  )

2. Conduct peak calling analysis on exons using the provided bam files.

SummarizedExomePeaks <- exomePeakCalling(merip_bams = MeRIP_Seq_Alignment,
                                         gff_dir = GENE_ANNO_GTF,
                                         genome = "hg19")

Alternatively, use the following code to quantify MeRIP-seq data on single based modification annotation.

SummarizedExomePeaks <- exomePeakCalling(merip_bams = MeRIP_Seq_Alignment,
                                         gff_dir = GENE_ANNO_GTF,
                                         genome = "hg19",
                                         mod_annot = MOD_ANNO_GRANGE)

3. Estimate size factors that are required for GC content bias correction.

SummarizedExomePeaks <- normalizeGC(SummarizedExomePeaks)

4. Report the statistics of modification peaks using Generalized Linear Model (GLM).

SummarizedExomePeaks <- glmM(SummarizedExomePeaks)

Alternatively, If the treated IP and input bam files are provided, glmDM function could be used to conduct differential modification analysis on modification Peaks with interactive GLM.

SummarizedExomePeaks <- glmDM(SummarizedExomePeaks)

5. Generate the scatter plot between GC content and log2 Fold Change (LFC).

plotLfcGC(SummarizedExomePeaks)

6. Generate the bar plot for the sequencing depth size factors.

plotSizeFactors(SummarizedExomePeaks)

7. Export the modification peaks and the peak statistics with user decided format.

exportResults(SummarizedExomePeaks, format = "BED")

Contact

Please contact the maintainer of exomePeak2 if you have encountered any problems:

ZhenWei : zhen.wei01@xjtlu.edu.cn

Please visit the github page of exomePeak2:

https://github.com/ZhenWei10/exomePeak2

Session Info

sessionInfo()

Any scripts or data that you put into this service are public.

exomePeak2 documentation built on Nov. 8, 2020, 5:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

exomePeak2
Bias Awared Peak Calling and Quantification for MeRIP-Seq

The exomePeak2 user's guide
In exomePeak2: Bias Awared Peak Calling and Quantification for MeRIP-Seq

Introduction

Installation

Peak Calling

Differential Modification Analysis

Quantification and Statistical Analysis with Single Based Modification Annotation

Peak Calling and Visualization in Multiple Steps

Contact

Session Info

Try the exomePeak2 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

exomePeak2 Bias Awared Peak Calling and Quantification for MeRIP-Seq

The *exomePeak2* user's guide In exomePeak2: Bias Awared Peak Calling and Quantification for MeRIP-Seq

Introduction

Installation

Peak Calling

Differential Modification Analysis

Quantification and Statistical Analysis with Single Based Modification Annotation

Peak Calling and Visualization in Multiple Steps

Contact

Session Info

Try the exomePeak2 package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

exomePeak2
Bias Awared Peak Calling and Quantification for MeRIP-Seq

The exomePeak2 user's guide
In exomePeak2: Bias Awared Peak Calling and Quantification for MeRIP-Seq