BiocStyle::markdown()

Introduction

Cross-linking immunoprecipitation (CLIP) is a technique that combines UV cross-linking and immunoprecipitation to analyse protein-RNA interactions or to pinpoint RNA modifications (e.g. m6A). CLIP-based methods, such as iCLIP and eCLIP, allow precise mapping of RNA modification sites or RNA-binding protein (RBP) binding sites on a genome-wide scale. These techniques help us to unravel post-transcriptional regulatory networks. In order to make the visualization of CLIP data easier, we develop r Biocpkg("cliProfiler") package. The r Biocpkg("cliProfiler") includes seven functions which allow users easily make different profile plots.

The r Biocpkg("cliProfiler") package is available at https://bioconductor.org and can be installed via BiocManager::install:

if (!require("BiocManager"))
    install.packages("BiocManager")
BiocManager::install("cliProfiler")

A package only needs to be installed once. Load the package into an R session with

library(cliProfiler)

The Requirement of Data and Annotation file

The input data for using all the functions in r Biocpkg("cliProfiler") should be the peak calling result or other similar object that represents the RBP binding sites or RNA modification position. Moreover, these peaks/signals be stored in the GRanges object. The GRanges is an S4 class which defined by r Biocpkg("GenomicRanges"). The GRanges class is a container for the genomic locations and their associated annotations. For more information about GRanges objects please check r Biocpkg("GenomicRanges") package. An example of GRanges object is shown below:

testpath <- system.file("extdata", package = "cliProfiler")
## loading the test GRanges object
test <- readRDS(file.path(testpath, "test.rds"))
## Show an example of GRanges object
test

The annotation file that required by functions exonProfile, geneTypeProfile, intronProfile, spliceSiteProfile and metaGeneProfile should be in the gff3 format and download from https://www.gencodegenes.org/. In the r Biocpkg("GenomicRanges") package, we include a test gff3 file.

## the path for the test gff3 file
test_gff3 <- file.path(testpath, "annotation_test.gff3")
## the gff3 file can be loaded by import.gff3 function in rtracklayer package
shown_gff3 <- rtracklayer::import.gff3(test_gff3)
## show the test gff3 file
shown_gff3

The function windowProfile allows users to find out the enrichment of peaks against the customized annotation file. This customized annotation file should be stored in the GRanges object.

metaGeneProfile

metaGeneProfile() outputs a meta profile, which shows the location of binding sites or modification sites ( peaks/signals) along transcript regions (5’UTR, CDS and 3’UTR). The input of this function should be a GRanges object.

Besides the GRanges object, a path to the gff3 annotation file which download from Gencode is required by metaGeneProfile.

The output of metaGeneProfile is a List objects. The List one contains the GRanges objects with the calculation result which can be used in different ways later.

meta <- metaGeneProfile(object = test, annotation = test_gff3)
meta[[1]]

Here is an explanation of the metaData columns of the output GRanges objects:

The List two is the meta plot which in the ggplot class. The user can use all the functions from ggplot2 to change the detail of this plot.

library(ggplot2)
## For example if user want to have a new name for the plot
meta[[2]] + ggtitle("Meta Profile 2")

For the advance usage, the metaGeneProfile provides two methods to calculate the relative position. The first method return a relative position of the peaks/signals in the genomic feature without the introns. The second method return a relative position value of the peak in the genomic feature with the introns. With the parameter include_intron we can easily shift between these two methods. If the data is a polyA plus data, we will recommend you to set include_intron = FALSE.

meta <- metaGeneProfile(object = test, annotation = test_gff3, 
                        include_intron = TRUE)
meta[[2]]

The group option allows user to make a meta plot with multiple conditions. Here is an example:

test$Treat <- c(rep("Treatment 1",50), rep("Treatment 2", 50))
meta <- metaGeneProfile(object = test, annotation = test_gff3, 
                        group = "Treat")
meta[[2]]

Besides, we provide an annotation filtering option for making the meta plot. The exlevel and extranscript_support_level could be used for specifying which level or transcript support level should be excluded. For excluding the transcript support level NA, user can use 6 instead of NA. About more information of level and transcript support level you can check the Gencode data format.

metaGeneProfile(object = test, annotation = test_gff3, exlevel = 3, 
                extranscript_support_level = c(4,5,6))

The split option could help to make the meta profile for the peaks/signals in 5'UTR, CDS and 3'UTR separately. The grey dotted line represents the peaks's distribution across all region.

meta <- metaGeneProfile(object = test, annotation = test_gff3, split = TRUE)
meta[[2]]

intronProfile

The function intronProfile generates the profile of peaks/signals in the intronic region. Here is an example for a quick use of intronProfile.

intron <-  intronProfile(test, test_gff3)

Similar to metaGeneProfile, the output of intronProfile is a List object which contains two Lists. List one is the input GRanges objects with the calculation result.

intron[[1]]

The explanation of meta data in the output of intronProfile list one is pasted down below:

The List two includes a ggplot object.

intron[[2]]

Similar to metaGeneProfile, in intronProfile, we provide options , such as group, exlevel and extranscript_support_level. The group function could be used to generate the comparison plot.

intron <-  intronProfile(test, test_gff3, group = "Treat")
intron[[2]]

The parameter exlevel and extranscript_support_level could be used for specifying which level or transcript support level should be excluded. For excluding the transcript support level NA, you can use 6. About more information of level and transcript support level you can check the Gencode data format.

intronProfile(test, test_gff3, group = "Treat", exlevel = 3, 
    extranscript_support_level = c(4,5,6))

Moreover, in the intronProfile we provide parameters maxLength and minLength to filter the maximum and minimum length of the intron.

intronProfile(test, test_gff3, group = "Treat", maxLength = 10000,
    minLength = 50)

exonProfile

The exonProfile could help to generate a profile of peaks/signals in the exons which surrounded by introns. The output of exonProfile is a List object. The List one is the GRanges objects of input data with the calculation result.

## Quick use
exon <- exonProfile(test, test_gff3)
exon[[1]]

Here is the explanation of the meta data column that output from exonProfile:

The List two is a ggplot object which contains the exon profile.

exon[[2]]

The usage of all parameters of exonProfile is similar to intronProfile. For more detail of parameter usage please check the intronProfile section.

windowProfile

Since the metaGeneProfile, intronProfile and exonProfile require a annotation file in gff3 format and downloaded from https://www.gencodegenes.org/. These functions could only be used for human and mouse. For the user who works on other species or has a customized annotation file (not in gff3 format), we develop the function windowProfile.

windowProfile requires GRanges object as input and annotation. And windowProfile output the relative position of each peak within the given annotation GRanges. For example, if user would like to make a profile against all the exons with windowProfile, you just need to first extract all the exonic region as a GRanges object then run the windowProfile. Here is an example about using windowProfile to generate the profile.

library(rtracklayer)
## Extract all the exon annotation
test_anno <- rtracklayer::import.gff3(test_gff3)
test_anno <- test_anno[test_anno$type == "exon"]
## Run the windowProfile
window_profile <- windowProfile(test, test_anno)

The output of windowProfile is a List objects. In the List one, you will find the GRanges object of input peaks and calculation result. And the List two contains the ggplot of windowProfile.

window_profile[[1]]

Here is an explanation of the output of windowProfile:

window_profile[[2]]

motifProfile

motifProfile generates the motif enrichment profile around the center of the input peaks. The IUPAC code could be used for the motif parameter. The IUPAC code includes: A, T, C, G, R, Y, S, W, K, M, B, D, H, V, N. The parameter flanking represents to the size of window that user would like to check around the center of peaks. The parameter fraction could be used to change the y-axis from frequency to fraction.

For using the motifProtile, the BSgenome with the sequence information of the species that you are working with is required.

## Example for running the motifProfile
## The working species is mouse with mm10 annotation.
## Therefore the package 'BSgenome.Mmusculus.UCSC.mm10' need to be installed in 
## advance.
motif <- motifProfile(test, motif = "DRACH",
    genome = "BSgenome.Mmusculus.UCSC.mm10",
    fraction = TRUE, title = "Motif Profile",
    flanking = 10)

The output of motifProfile is a List object. List 1 contains the data.frame of the motif enrichment information for each position around the center of the input peaks/signals. List 2 is the ggplot object of the plot of motif enrichment.

motif[[1]]

Each bar in the plot of motifProfile represents for the start site of the motif that is defined by the user.

motif[[2]]

geneTypeProfile

The geneTypeProfile could give users the information of the gene type of the input peaks. The input peaks for geneTypeProfile should be stored in the GRanges objects. The annotation file should be a gff3 file and downloaded from https://www.gencodegenes.org/.

## Quick use of geneTypeProfile
geneTP <- geneTypeProfile(test, test_gff3)

The output of geneTypeProfile is an List object. List one includes a GRanges object with the input peaks and the assignment information.

geneTP[[1]]

Here is an explanation of the output GRanges object from the geneTypeProfile.

geneTP[[2]]

spliceSiteProfile

The spliceSiteProfile gives users the information of the enrichment of peaks
around the 5' and 3' splice site (SS) in the absolute distance.

SSprofile <- spliceSiteProfile(test, test_gff3, flanking=200, bin=40)

The output of spliceSiteProfile is a List object. The List one includes the GRanges object of input peaks and the position information of this peak around the SS.

SSprofile[[1]]

Here is an explanation of output of spliceSiteProfile.

SSprofile[[2]]

Similar to metaProfile, The parameter exlevel and extranscript_support_level could be used for specifying which level or transcript support level should be excluded. For excluding the transcript support level NA, you can use 6. About more information of level and transcript support level you can check the Gencode data format.

spliceSiteProfile(test, test_gff3, flanking=200, bin=40, exlevel=3,
                        extranscript_support_level = 6,
                        title = "Splice Site Profile")

SessionInfo

The following is the session info that generated this vignette:

sessionInfo()


Codezy99/cliProfiler documentation built on Dec. 17, 2021, 2:59 p.m.