annotateRIP: Annotate RIP peaks with genomic information and perform GO...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Given the genomic coordinates of each predicted RIP regions, query the Ensembl database whether each region is nearby or overlaps any known (noncoding) genes.

Usage

1
2
3
annotateRIP(sigGRanges, biomaRt_dataset, featureType = "TSS", 
	goAnno, strandSpecific = FALSE, exportFormat = "txt", 
	hasGOdb = !missing(goAnno), goPval = 0.1, outDir, ...)

Arguments

sigGRanges

GRanges object indicating the chromosomal coordinates of each RIP peaks.

biomaRt_dataset

Ensembl dataset available from biomaRt (See listDatasets). For instance, the human and mouse annotations are hsapiens_gene_ensembl and mmusculus_gene_ensembl, respectively.

featureType

TSS, miRNA, Exon, 5'UTR, 3'UTR, transcript or Exon plus UTR defined in getAnnotation.

goAnno

Optional argugment that specifies a GO dataset used for GO enrichement analysis performed by getEnrichedGO. For instance, the human and mouse GO datasets are org.Hs.eg.db and org.Mm.eg.db.

strandSpecific

Indicate whether the annotations should be strand-specific (Default: FALSE)

exportFormat

Format to export using exportGRanges (Default: "txt", i.e. tab-delim file).

hasGOdb

A binary flag that indicates whether GO enrichement is performed in order to export the results. hasGOdb can be FALSE either because goAnno is not specifiy or because the GO database does not exist.

goPval

P-value cutoff to determine the significance of enriched GO terms by getEnrichedGO.

outDir

Output directory.

...

Extra arguments passed to useMart to specify the database and to passed getEnrichedGO to specify the GO enrichment procedure.

Details

To access the up-to-date Ensembl database, RIPSeeker employs useMart and getAnnotation from biomaRt and ChIPpeakAnno Bioconductor packages to dynamically establish internet connection to the database and retrieve the up-to-date annotations. Then, annotatePeakInBatch from ChIPpeakAnno is used to efficiently annotate all of the predicted regions based on the Ensembl annotation. A predicted region may overlap multiple genes, all of which will be reported as separate records. Moreover, getEnrichedGO from ChIPpeakAnno is applied to the annotated predictions to discover enriched Gene Ontology (GO) terms involving the protein-associated transcriptome.

In order to use old annotation (e.g., mm9 v.s. mm10), user also needs to specify the host and biomart arguments accepted within useMart. To access to mouse annotation from Ensembl version 65, for instance, user needs to call annotateRIP(..., dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...), which will run useMart(dataset="mmusculus_gene_ensembl", biomart="ENSEMBL_MART_ENSEMBL", host="dec2011.archive.ensembl.org", ...) to get the mm9 annotation from Ensembl (v65).

Value

sigGRangesAnnotated

sigGRanges augmented with genomic information including "ensembl_gene_id", "external_gene_id", and "description"

enrichedGO

Output from getEnrichedGO. All three main GO categories ("Biological Process", "Molecular Function", "Cellular Component") are combined together and returned. The argument is only returned when hasGOdb is TRUE.

If outDir is specified, then the above sigGRangesAnnotated is saved as RIPregions_annotated.txt and RIPregions_annotated.RData, and enrichedGO as RIPregions_enrichedGO.txt in the outDir directory.

Author(s)

Yue Li

References

Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney and Wolfgang Huber, Nature Protocols 4, 1184-1191 (2009).

BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma and Wolfgang Huber, Bioinformatics 21, 3439-3440 (2005).

Lihua Julie Zhu, Herve Pages, Claude Gazin, Nathan Lawson, Jianhong Ou, Simon Lin, David Lapointe and Michael Green (2012). ChIPpeakAnno: Batch annotation of the peaks identified from either ChIP-seq, ChIP-chip experiments or any experiments resulted in large number of chromosome ranges.. R package version 2.4.0.

See Also

useMart, getAnnotation, getEnrichedGO

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
if(interactive()) { # need internet connection
# Retrieve system files
extdata.dir <- system.file("extdata", package="RIPSeeker") 

bamFiles <- list.files(extdata.dir, ".bam$", recursive=TRUE, full.names=TRUE)

bamFiles <- grep("PRC2", bamFiles, value=TRUE)

# Parameters setting
binSize <- NULL							# automatically determine bin size
minBinSize <- 10000						# min bin size in automatic bin size selection
maxBinSize <- 12000						# max bin size in automatic bin size selection
multicore <- FALSE						# use multicore
strandType <- "-"							# set strand type to minus strand

biomart <- "ENSEMBL_MART_ENSEMBL"		# use archive to get ensembl 65
dataset <- "mmusculus_gene_ensembl"		# mouse dataset id name	
host <- "dec2011.archive.ensembl.org" 	# use ensembl 65 for annotation

goAnno <- "org.Mm.eg.db"


################ run main function for HMM inference on all chromosomes ################
mainSeekOutputRIP <- mainSeek(
    bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=TRUE),
		binSize=binSize, minBinSize = minBinSize, 
		maxBinSize = maxBinSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = TRUE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)
		
# use defined binSize from RIP
RIPBinSize <- lapply(mainSeekOutputRIP$nbhGRList, function(x) median(width(x)))


mainSeekOutputCTL <- mainSeek(
    bamFiles=grep(pattern="SRR039214", bamFiles, value=TRUE, invert=FALSE),
		binSize=RIPBinSize, strandType=strandType, 		
		reverseComplement=TRUE, genomeBuild="mm9",
		uniqueHit = TRUE, assignMultihits = TRUE, 
		rerunWithDisambiguatedMultihits = TRUE,				
		multicore=multicore, silentMain=FALSE, verbose=TRUE)

################ significance test on Viterbi predicted peaks ################
ripGR <- seekRIP(mainSeekOutputRIP$nbhGRList$chrX, mainSeekOutputCTL$nbhGRList)


################ Annotate peaks ################

annotatedRIPGR <- annotateRIP(sigGRanges = ripGR,
				biomaRt_dataset = dataset, goAnno = goAnno, 
				strandSpecific = !is.null(strandType),
				host=host, biomart=biomart)

head(annotatedRIPGR$sigGRangesAnnotated)	
}

gorillayue/RIPSeeker documentation built on May 17, 2019, 7:59 a.m.