Description Usage Arguments Details Value Author(s) References See Also Examples
Given the genomic coordinates of each predicted RIP regions, query the Ensembl database whether each region is nearby or overlaps any known (noncoding) genes.
1 2 3 | annotateRIP(sigGRanges, biomaRt_dataset, featureType = "TSS",
goAnno, strandSpecific = FALSE, exportFormat = "txt",
hasGOdb = !missing(goAnno), goPval = 0.1, outDir, ...)
|
sigGRanges |
|
biomaRt_dataset |
Ensembl dataset available from biomaRt (See |
featureType |
TSS, miRNA, Exon, 5'UTR, 3'UTR, transcript or Exon plus UTR defined in |
goAnno |
Optional argugment that specifies a GO dataset used for GO enrichement analysis performed by |
strandSpecific |
Indicate whether the annotations should be strand-specific (Default: FALSE) |
exportFormat |
Format to export using |
hasGOdb |
A binary flag that indicates whether GO enrichement is performed in order to export the results. |
goPval |
P-value cutoff to determine the significance of enriched GO terms by |
outDir |
Output directory. |
... |
Extra arguments passed to |
To access the up-to-date Ensembl database, RIPSeeker employs useMart
and getAnnotation
from biomaRt and ChIPpeakAnno Bioconductor packages to dynamically establish internet connection to the database and retrieve the up-to-date annotations. Then, annotatePeakInBatch
from ChIPpeakAnno is used to efficiently annotate all of the predicted regions based on the Ensembl annotation. A predicted region may overlap multiple genes, all of which will be reported as separate records. Moreover, getEnrichedGO
from ChIPpeakAnno is applied to the annotated predictions to discover enriched Gene Ontology (GO) terms involving the protein-associated transcriptome.
In order to use old annotation (e.g., mm9 v.s. mm10), user also needs to specify the host and biomart arguments accepted within useMart
. To access to mouse annotation from Ensembl version 65, for instance, user needs to call annotateRIP(..., dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...), which will run useMart(dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...) to get the mm9 annotation from Ensembl (v65).
sigGRangesAnnotated |
|
enrichedGO |
Output from getEnrichedGO. All three main GO categories ("Biological Process", "Molecular Function", "Cellular Component") are combined together and returned. The argument is only returned when |
If outDir
is specified, then the above sigGRangesAnnotated
is saved as RIPregions_annotated.txt and RIPregions_annotated.RData, and enrichedGO
as RIPregions_enrichedGO.txt in the outDir
directory.
Yue Li
Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).
BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).
Lihua Julie Zhu, Herve Pages, Claude Gazin, Nathan Lawson, Jianhong Ou, Simon Lin, David Lapointe and Michael Green (2012). ChIPpeakAnno: Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments or any experiments resulted in large number of chromosome ranges.. R package version 2.4.0.
useMart, getAnnotation, getEnrichedGO
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | if(interactive()) { # need internet connection
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker")
bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)
bamFiles <- grep("PRC2", bamFiles, value=TRUE)
# Parameters setting
binSize <- NULL # automatically determine bin size
minBinSize <- 10000 # min bin size in automatic bin size selection
maxBinSize <- 12000 # max bin size in automatic bin size selection
multicore <- FALSE # use multicore
strandType <- "-" # set strand type to minus strand
biomart <- "ENSEMBL_MART_ENSEMBL" # use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl" # mouse dataset id name
host <- "dec2011.archive.ensembl.org" # use ensembl 65 for annotation
goAnno <- "org.Mm.eg.db"
################ run main function for HMM inference on all chromosomes ################
mainSeekOutputRIP <- mainSeek(
bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=TRUE),
binSize=binSize, minBinSize = minBinSize,
maxBinSize = maxBinSize, strandType=strandType,
reverseComplement=TRUE, genomeBuild="mm9",
uniqueHit = TRUE, assignMultihits = TRUE,
rerunWithDisambiguatedMultihits = TRUE,
multicore=multicore, silentMain=FALSE, verbose=TRUE)
# use defined binSize from RIP
RIPBinSize <- lapply(mainSeekOutputRIP$nbhGRList, function(x) median(width(x)))
mainSeekOutputCTL <- mainSeek(
bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=FALSE),
binSize=RIPBinSize, strandType=strandType,
reverseComplement=TRUE, genomeBuild="mm9",
uniqueHit = TRUE, assignMultihits = TRUE,
rerunWithDisambiguatedMultihits = TRUE,
multicore=multicore, silentMain=FALSE, verbose=TRUE)
################ significance test on Viterbi predicted peaks ################
ripGR <- seekRIP(mainSeekOutputRIP$nbhGRList$chrX, mainSeekOutputCTL$nbhGRList)
################ Annotate peaks ################
annotatedRIPGR <- annotateRIP(sigGRanges = ripGR,
biomaRt_dataset = dataset, goAnno = goAnno,
strandSpecific = !is.null(strandType),
host=host, biomart=biomart)
head(annotatedRIPGR$sigGRangesAnnotated)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.