annotateRIP: Annotate RIP peaks with genomic information and perform GO...
In yueli-compbio/RIPSeeker: RIPSeeker: a statistical package for identifying protein-associated transcripts from RIP-seq experiments

Description Usage Arguments Details Value Author(s) References See Also Examples

Given the genomic coordinates of each predicted RIP regions, query the Ensembl database whether each region is nearby or overlaps any known (noncoding) genes.

1
2
3

annotateRIP(sigGRanges, biomaRt_dataset, featureType = "TSS", 
	goAnno, strandSpecific = FALSE, exportFormat = "txt", 
	hasGOdb = !missing(goAnno), goPval = 0.1, outDir, ...)

`sigGRanges`	`GRanges` object indicating the chromosomal coordinates of each RIP peaks.
`biomaRt_dataset`	Ensembl dataset available from biomaRt (See `listDatasets`). For instance, the human and mouse annotations are `hsapiens_gene_ensembl` and `mmusculus_gene_ensembl`, respectively.
`featureType`	TSS, miRNA, Exon, 5'UTR, 3'UTR, transcript or Exon plus UTR defined in `getAnnotation`.
`goAnno`	Optional argugment that specifies a GO dataset used for GO enrichement analysis performed by `getEnrichedGO`. For instance, the human and mouse GO datasets are `org.Hs.eg.db` and `org.Mm.eg.db`.
`strandSpecific`	Indicate whether the annotations should be strand-specific (Default: FALSE)
`exportFormat`	Format to export using `exportGRanges` (Default: "txt", i.e. tab-delim file).
`hasGOdb`	A binary flag that indicates whether GO enrichement is performed in order to export the results. `hasGOdb` can be FALSE either because `goAnno` is not specifiy or because the GO database does not exist.
`goPval`	P-value cutoff to determine the significance of enriched GO terms by `getEnrichedGO`.
`outDir`	Output directory.
`...`	Extra arguments passed to `useMart` to specify the database and to passed `getEnrichedGO` to specify the GO enrichment procedure.

To access the up-to-date Ensembl database, RIPSeeker employs useMart and getAnnotation from biomaRt and ChIPpeakAnno Bioconductor packages to dynamically establish internet connection to the database and retrieve the up-to-date annotations. Then, annotatePeakInBatch from ChIPpeakAnno is used to efficiently annotate all of the predicted regions based on the Ensembl annotation. A predicted region may overlap multiple genes, all of which will be reported as separate records. Moreover, getEnrichedGO from ChIPpeakAnno is applied to the annotated predictions to discover enriched Gene Ontology (GO) terms involving the protein-associated transcriptome.

In order to use old annotation (e.g., mm9 v.s. mm10), user also needs to specify the host and biomart arguments accepted within useMart. To access to mouse annotation from Ensembl version 65, for instance, user needs to call annotateRIP(..., dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...), which will run useMart(dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...) to get the mm9 annotation from Ensembl (v65).

`sigGRangesAnnotated`	`sigGRanges` augmented with genomic information including "ensembl_gene_id", "external_gene_id", and "description"
`enrichedGO`	Output from getEnrichedGO. All three main GO categories ("Biological Process", "Molecular Function", "Cellular Component") are combined together and returned. The argument is only returned when `hasGOdb` is TRUE.

If outDir is specified, then the above sigGRangesAnnotated is saved as RIPregions_annotated.txt and RIPregions_annotated.RData, and enrichedGO as RIPregions_enrichedGO.txt in the outDir directory.

Yue Li

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).

Lihua Julie Zhu, Herve Pages, Claude Gazin, Nathan Lawson, Jianhong Ou, Simon Lin, David Lapointe and Michael Green (2012). ChIPpeakAnno: Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments or any experiments resulted in large number of chromosome ranges.. R package version 2.4.0.

useMart, getAnnotation, getEnrichedGO

if(interactive()) { # need internet connection
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- NULL							# automatically determine bin size
minBinSize <- 10000						# min bin size in automatic bin size selection
maxBinSize <- 12000						# max bin size in automatic bin size selection
multicore <- FALSE						# use multicore
strandType <- "-"							# set strand type to minus strand

biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl"		# mouse dataset id name	
host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation

goAnno <- "org.Mm.eg.db"


################ run main function for HMM inference on all chromosomes ################
mainSeekOutputRIP <- mainSeek(
    bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=TRUE),
		binSize=binSize, minBinSize = minBinSize, 
		maxBinSize = maxBinSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = TRUE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)
		
# use defined binSize from RIP
RIPBinSize <- lapply(mainSeekOutputRIP$nbhGRList, function(x) median(width(x)))


mainSeekOutputCTL <- mainSeek(
    bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=FALSE),
		binSize=RIPBinSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = TRUE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)

################ significance test on Viterbi predicted peaks ################
ripGR <- seekRIP(mainSeekOutputRIP$nbhGRList$chrX, mainSeekOutputCTL$nbhGRList)


################ Annotate peaks ################

annotatedRIPGR <- annotateRIP(sigGRanges = ripGR,
				biomaRt_dataset = dataset, goAnno = goAnno, 
				strandSpecific = !is.null(strandType),
				host=host, biomart=biomart)

head(annotatedRIPGR$sigGRangesAnnotated)	
}