reg2gene: Hierarchically annotates input GRanges object with...

Description Usage Arguments Details Value Examples

Description

This function either annotates input regions to their nearby genes or it hierarchical runs association procedure (when GIneraction object is used as an input) as follows: promoters,enhancers, nearby genes.

reg2gene: An R package for predicting the target genes for regulatory elements

Usage

1
2
3
reg2gene(windows, geneAnnotations, interactions = NULL,
  annotateInteractions = FALSE, upstream = 1000, downstream = 1000,
  distance = 1e+06, identified = TRUE, ...)

Arguments

windows

GRanges object that contains the windows (genomic regions) of interest:enhancers, promoters, CpG islands, ChIP-Seq or DNase-Seq peaks...

geneAnnotations

GRanges which contains info about TSS for genes of interest, or ideally TSS for all genes in the genome, since nearest gene annotation methods (step 1 TSS+/-1000bp & step 3 TSS +/- 1Mb) is based on the gene TSS coordinates from this object. Necessary meta-data is a column with gene names - "name"

interactions

(default NULL) GInteractions object which stores interactions of interest, either as regulatory regions - gene associations or as interacting locations obtained by chromatin conformation capture and related methods. Info about gene names should be stored as a meta-data. Anchor1 needs to be regulatory region, whereas anchor2 is location of gene/TSS.

annotateInteractions

(default FALSE) This argument is useful in the case when interaction dataset does not store info about genes, whereas it provides only an info about interacting locations in the genome (not necessarilly regulatory region - promoter interactions). If FALSE, no additional action done. If TRUE, interactions are firstly annotated to the genes (unannotated interactions are removed from the dataset), windows are subsequently annotated to genes using annotated interactions object

upstream

number of basepairs upstream from TSS to look for input windows. default 1000

downstream

number of basepairs downstream from TSS to look for input windows. default 1000

distance

(default 1Mb). Maximal allowed distance between genes TSS & input peak. Used in the 3rd step of the association procedure (genomic regions is associated to the closest gene if the distance between these two locations is smaller than prediefined distance threshold.)

identified

(default TRUE). If TRUE, report genomic regions AND corresponding genes; If FALSE, report regions for which info about gene is missing. NOTE! When this argument set to TRUE, regions are sometimes annotated based on the nearest gene procedure instead of enhancer. However, gene annotation is correct.

...

Any parameter in the future

Details

This function annotates input windows (genomicRegions) to the promoter regions of genes (and correspondingly to that gene) if only location of genes (as GRanges object) and of windows (genomicRegions) is provided. Meaning that, if the input windows (genomicRegions) is located within +/- upstream/downstream distance from TSS of a gene, then this gene is annotated to the queried region. When GIneraction object is used as an input, then hierarchical association procedure is runned as follows: promoters,enhancers, nearby genes, eg: 1) genomic regions of interest are first considered to be promoters and annotated with nearby genes if they are located within a certain distance from TSS of nerbay gene (default +/-1000bp); otherwise 2) remaning genes are overlapped with enhancer regions, and genes annotated to that enhancer regions are reported, 3) if no overlap with either promoters nor enhancers is identified, then closest gene is reported if it is located within 1Mb 4) if no gene located within 1Mb has been identified then, this region is filtered out. IMPORTANT! Anchor1 in GInteractions object needs to be regulatory region, whereas anchor2 is the location of gene/TSS. If GInteractions is missing as an input, then input windows are tested ONLY whether they are located in the vicinity of TSS of genes stored in this object, and if yes the corresponding gene is reported.

Value

A GInteractions object that contains info about genes (location+meta-data) annotated with regions of interest. Anchor1 corresponds to the queried windows (genomicRegions location), whereas anchor2 corresponds to the extended (+/-upstream/downstream) TSS location. It additionaly reports whether genomic region was annotated based on the overlap with promoter or enhancer region of the annotated gene, or an annotation was assessed based on the proximity to the gene (possible values "enhancer","promoter","nearestGene").

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# CREATE genomicRegions test object - windows argument

library(GenomicRanges)
library(InteractionSet)
library(GenomicInteractions)
library(Gviz)
library(biomaRt)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(GenomicFeatures)
library(reg2gene)
 
windows <- GRanges(c("chr1:1-2", # 1. overlap prom
                     "chr2:1-2",  # 2. overlap enh
                     "chr3:1-2", # 3. overlap tss +/- 1,000,000
                     "chr4:1-2")) # 4. do not overlap tss +/- 1Mb

annotationsEnh <- GRanges(c("chr1:1-2",
                            "chr2:1-2",
                            "chr3:100000-100002",
                            "chr4:10000001-10000002"))

annotationsGenes <- GRanges(c("chr1:1-2",
                              "chr2:100000-100002",
                              "chr3:99999-100002",
                              "chr4:10000001-10000002"))

seqlengths(annotationsEnh) <- seqlengths(annotationsGenes) <- rep(10000002,
                                                                  4)
interactions = GInteractions(annotationsEnh,annotationsGenes,
                             name=c("gen1","gen2","gen3","gen4"))


geneAnnotations=second(interactions)
mcols(geneAnnotations) <- mcols(interactions) 

 reg2gene(windows=windows,
interactions=interactions, 
geneAnnotations = geneAnnotations)

# which regions are not identified

reg2gene(windows=windows,
         interactions=interactions, 
         geneAnnotations = geneAnnotations,
         identified=FALSE)

# if interactions are not available, assign interactions based solely on the 
# proximity to promoters
reg2gene(windows = windows,
         interactions=NULL,
         geneAnnotations = geneAnnotations)
 

BIMSBbioinfo/reg2gene documentation built on May 3, 2019, 6:42 p.m.