BiocStyle::markdown()
Cross-linking immunoprecipitation (CLIP) is a technique that combines UV
cross-linking and immunoprecipitation to analyse protein-RNA interactions or to
pinpoint RNA modifications (e.g. m6A). CLIP-based methods, such as iCLIP and
eCLIP, allow precise mapping of RNA modification sites or RNA-binding protein
(RBP) binding sites on a genome-wide scale. These techniques help us to unravel
post-transcriptional regulatory networks. In order to make the visualization of
CLIP data easier, we develop r Biocpkg("cliProfiler")
package. The
r Biocpkg("cliProfiler")
includes seven functions which allow users easily
make different profile plots.
The r Biocpkg("cliProfiler")
package is available at
https://bioconductor.org and can be
installed via BiocManager::install
:
if (!require("BiocManager")) install.packages("BiocManager") BiocManager::install("cliProfiler")
A package only needs to be installed once. Load the package into an R session with
library(cliProfiler)
The input data for using all the functions in r Biocpkg("cliProfiler")
should
be the peak calling result or other similar object that represents the RBP
binding sites or RNA modification position. Moreover, these peaks/signals
be
stored in the GRanges object. The GRanges is an S4 class which defined
by r Biocpkg("GenomicRanges")
. The GRanges class is a container for the
genomic locations and their associated annotations. For more information about
GRanges objects please check r Biocpkg("GenomicRanges")
package. An example
of GRanges object is shown below:
testpath <- system.file("extdata", package = "cliProfiler") ## loading the test GRanges object test <- readRDS(file.path(testpath, "test.rds")) ## Show an example of GRanges object test
The annotation file that required by functions exonProfile
,
geneTypeProfile
, intronProfile
, spliceSiteProfile
and metaGeneProfile
should be in the gff3
format and download from
https://www.gencodegenes.org/. In the
r Biocpkg("GenomicRanges")
package, we include a test gff3
file.
## the path for the test gff3 file test_gff3 <- file.path(testpath, "annotation_test.gff3") ## the gff3 file can be loaded by import.gff3 function in rtracklayer package shown_gff3 <- rtracklayer::import.gff3(test_gff3) ## show the test gff3 file shown_gff3
The function windowProfile
allows users to find out the enrichment of peaks
against the customized annotation file. This customized annotation file should
be stored in the GRanges object.
metaGeneProfile()
outputs a meta profile, which shows the location of binding
sites or modification sites ( peaks/signals)
along transcript regions
(5’UTR, CDS and 3’UTR). The input of this function should be a GRanges
object.
Besides the GRanges
object, a path to the gff3
annotation file which
download from Gencode is required by
metaGeneProfile
.
The output of metaGeneProfile
is a List
objects. The List
one contains
the GRanges objects with the calculation result which can be used in different
ways later.
meta <- metaGeneProfile(object = test, annotation = test_gff3) meta[[1]]
Here is an explanation of the metaData columns of the output GRanges objects:
peak/signal
belongs to. peak/signal
belongs. peak/signal
within the genomic
region. This value close to 0 means this peak located close to the 5' end of
the genomic feature. The position value close to 1 means the peak close to the
3' end of the genomic feature. Value 5 means this peaks can not be mapped to
any annotation.The List
two is the meta plot which in the ggplot
class. The user can use
all the functions from ggplot2
to change the detail of this plot.
library(ggplot2) ## For example if user want to have a new name for the plot meta[[2]] + ggtitle("Meta Profile 2")
For the advance usage, the metaGeneProfile
provides two methods to calculate
the relative position. The first method return a relative position of the
peaks/signals
in the genomic feature without the introns. The second method
return a relative position value of the peak in the genomic feature with the
introns. With the parameter include_intron
we can easily shift between these
two methods. If the data is a polyA plus data, we will recommend you to set
include_intron = FALSE
.
meta <- metaGeneProfile(object = test, annotation = test_gff3, include_intron = TRUE) meta[[2]]
The group
option allows user to make a meta plot with multiple conditions.
Here is an example:
test$Treat <- c(rep("Treatment 1",50), rep("Treatment 2", 50)) meta <- metaGeneProfile(object = test, annotation = test_gff3, group = "Treat") meta[[2]]
Besides, we provide an annotation filtering option for making the meta plot.
The exlevel
and extranscript_support_level
could be used for specifying
which level or transcript support level should be excluded. For excluding
the transcript support level NA, user can use 6 instead of NA. About more
information of level and transcript support level you can check the
Gencode data format.
metaGeneProfile(object = test, annotation = test_gff3, exlevel = 3, extranscript_support_level = c(4,5,6))
The split
option could help to make the meta profile for the peaks/signals
in 5'UTR, CDS and 3'UTR separately. The grey dotted line represents the peaks's
distribution across all region.
meta <- metaGeneProfile(object = test, annotation = test_gff3, split = TRUE) meta[[2]]
The function intronProfile
generates the profile of peaks/signals
in the
intronic region. Here is an example for a quick use of intronProfile
.
intron <- intronProfile(test, test_gff3)
Similar to metaGeneProfile, the output of intronProfile
is a List
object
which contains two Lists
. List
one is the input GRanges objects with the
calculation result.
intron[[1]]
The explanation of meta data in the output of intronProfile
list one is
pasted down below:
The List
two includes a ggplot object.
intron[[2]]
Similar to metaGeneProfile, in intronProfile, we provide options , such as
group
, exlevel
and extranscript_support_level
. The group
function could
be used to generate the comparison plot.
intron <- intronProfile(test, test_gff3, group = "Treat") intron[[2]]
The parameter exlevel
and extranscript_support_level
could be used for
specifying which level or transcript support level should be excluded.
For excluding the transcript support level NA, you can use 6. About more
information of level and transcript support level you can check the
Gencode data format.
intronProfile(test, test_gff3, group = "Treat", exlevel = 3, extranscript_support_level = c(4,5,6))
Moreover, in the intronProfile we provide parameters maxLength
and
minLength
to filter the maximum and minimum length of the intron.
intronProfile(test, test_gff3, group = "Treat", maxLength = 10000, minLength = 50)
The exonProfile
could help to generate a profile of peaks/signals
in the
exons which surrounded by introns. The output of exonProfile is a List
object. The List
one is the GRanges
objects of input data with the
calculation result.
## Quick use exon <- exonProfile(test, test_gff3) exon[[1]]
Here is the explanation of the meta data column that output from
exonProfile
:
The List
two is a ggplot object which contains the exon profile.
exon[[2]]
The usage of all parameters of exonProfile
is similar to intronProfile
. For
more detail of parameter usage please check the intronProfile
section.
Since the metaGeneProfile
, intronProfile
and exonProfile
require a
annotation file in gff3
format and downloaded from
https://www.gencodegenes.org/. These functions
could only be used for human and mouse. For the user who works on other
species or has a customized annotation file (not in gff3 format), we develop
the function windowProfile
.
windowProfile
requires GRanges object as input and annotation. And
windowProfile
output the relative position of each peak within the given
annotation GRanges. For example, if user would like to make a profile against
all the exons with windowProfile
, you just need to first extract all the
exonic region as a GRanges object then run the windowProfile
. Here is an
example about using windowProfile
to generate the profile.
library(rtracklayer) ## Extract all the exon annotation test_anno <- rtracklayer::import.gff3(test_gff3) test_anno <- test_anno[test_anno$type == "exon"] ## Run the windowProfile window_profile <- windowProfile(test, test_anno)
The output of windowProfile
is a List
objects. In the List
one, you will
find the GRanges object of input peaks and calculation result. And the List
two contains the ggplot of windowProfile
.
window_profile[[1]]
Here is an explanation of the output of windowProfile
:
window_profile[[2]]
motifProfile
generates the motif enrichment profile around the center of the
input peaks. The IUPAC code
could be used for the motif
parameter. The IUPAC
code includes: A, T, C, G,
R, Y, S, W, K, M, B, D, H, V, N. The parameter flanking
represents to the
size of window that user would like to check around the center of peaks. The
parameter fraction
could be used to change the y-axis from frequency to
fraction.
For using the motifProtile
, the BSgenome
with the sequence information of
the species that you are working with is required.
## Example for running the motifProfile ## The working species is mouse with mm10 annotation. ## Therefore the package 'BSgenome.Mmusculus.UCSC.mm10' need to be installed in ## advance. motif <- motifProfile(test, motif = "DRACH", genome = "BSgenome.Mmusculus.UCSC.mm10", fraction = TRUE, title = "Motif Profile", flanking = 10)
The output of motifProfile
is a List
object. List
1 contains the
data.frame
of the motif enrichment information for each position around the
center of the input peaks/signals
. List
2 is the ggplot object of the
plot of motif enrichment.
motif[[1]]
Each bar in the plot of motifProfile
represents for the start site of the
motif that is defined by the user.
motif[[2]]
The geneTypeProfile
could give users the information of the gene type of the
input peaks. The input peaks for geneTypeProfile
should be stored in the
GRanges objects. The annotation file should be a gff3
file and downloaded
from https://www.gencodegenes.org/.
## Quick use of geneTypeProfile geneTP <- geneTypeProfile(test, test_gff3)
The output of geneTypeProfile
is an List
object. List
one includes a
GRanges object with the input peaks and the assignment information.
geneTP[[1]]
Here is an explanation of the output GRanges object from the
geneTypeProfile
.
geneTP[[2]]
The spliceSiteProfile
gives users the information of the enrichment of peaks
around the 5' and 3' splice site (SS) in the absolute distance.
SSprofile <- spliceSiteProfile(test, test_gff3, flanking=200, bin=40)
The output of spliceSiteProfile
is a List
object. The List
one includes
the GRanges object of input peaks and the position information of this peak
around the SS.
SSprofile[[1]]
Here is an explanation of output of spliceSiteProfile
.
SSprofile[[2]]
Similar to metaProfile
, The parameter exlevel
and
extranscript_support_level
could be used for
specifying which level or transcript support level should be excluded.
For excluding the transcript support level NA, you can use 6. About more
information of level and transcript support level you can check the
Gencode data format.
spliceSiteProfile(test, test_gff3, flanking=200, bin=40, exlevel=3, extranscript_support_level = 6, title = "Splice Site Profile")
The following is the session info that generated this vignette:
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.