benchmarkInteractions: Benchmarks interactions reg2gene models using benchmark data

Description Usage Arguments Details Value Author(s) Examples

Description

The function that takes as input results of associateReg2Gene or any other modelling procedure implemente in reg2gene package, and predefined benchmark dataset as GInteractions object. This function adds a metadata column with info about benchmarking success - whether tested regions are benchmarked or not. By default it reportes how many times interactions is observed in the benchmark dataset. If binary is set to TRUE, then vector of 0' and 1's is reported (1 - overlapping benchmark dataset at least once) and 0 (not overlapping benchmark dataset at all).

Usage

1
2
3
benchmarkInteractions(interactions, benchInteractions, preFilter = FALSE,
  binary = FALSE, forceByName = FALSE, mc.cores = 1,
  ignore.strand = FALSE, ...)

Arguments

interactions

a GInteractions object output from associateReg2Gene). Usually, 1st GRanges object, or anchor1 corresponds to the enahncer location, whereas the other GRanges object corresponds to the regulatory region locations.

benchInteractions

a GInteractions object output from associateReg2Gene) or a list of GInteractions object. Both regions are used in the benchmarking procedure. This object stores benchmarking informations eg interacting region coordinates from techniques such as HiC,eQTL studies...

preFilter

(def:FALSE). If TRUE, additional columns are added to the input interactions object (additionally to the Bench column that is reported by default) that store info whether tested regions have any potential to be benchmarked. Meaning, if all regulatory region-TSS pairs [anchor1 and anchor2 from interactions] do not overlap with any benchmark anchor1 or anchor2 location they will be reported to be 0 (or no potential to be benchmarked at all), otherwise it is 1 (possible to be benchmarked).E.g. it selects interactions regions only when both regulatory region and TSS have overlapping regions somewhere in the benchmarking set; across all benchmark anchor pairs, but not necessarily overlapping regions of the same benchmark pair. This info is important to a priori remove high number of true negatives in regulatoryReg-TSS pairs, before running confusionMatrix since TN are very abundant in the interactions dataset since benchmark dataset usually covers much smaller regions of the genome (method limitations)

binary

(def:FALSE) how many times reg2Gene interactions is observed in the benchmark dataset(s). If TRUE, reports if overlap with benchmark dataset is observed at least once).

forceByName

(def:FALSE) force benchmark data to have an equal gene coordinates as interactions if gene names overlap. IMPORTANT! Gene coordinates are necessarilly a second anchor of the input interactions and benchInteractions objects,and column with gene names needs to be called "name".

mc.cores

possible to be runned in parallel. Argument for mclapply f(); how many cores to use.

ignore.strand

argument to be passed to findOverlaps. When set to TRUE, the strand information is ignored in the overlap analysis.

...

further arguments to methods, not implemented yet

Details

GInteractions objects - an output of associateReg2Gene [or a list of such objects] and benchmark dataset are overlaped. linkOverlaps between interactions and benchmark object is performed, and for each input pair it is reported whether this pair is benchmarked or not,and how many times (if binary=F). Criss-cross overlap of interacting regions is performed; thus is anchor1 from benchmark dataset is overlapping anchor2 from tested dataset, than anchor2 from benchmark dataset needs to overlap anchor1 from tested datased, or vice-versa.

Additionally, details for preFilter option: All benchmark regions that can be confirmed by any combination of enh/gene pairs [anchor1 or anchor2 form interactions object] is obtained. Then selected unique union of anchor1 or anchor2 form interactions object is used as anchor1-anchor2 pairs that can be benchmarked. Reasoning, if present in this set anchor1 or anchor2 regions form interactions object necessarily need to have other member of the pair overlapping somewhere in benchmark dataset.

Value

GInteractions object with added benchmark results metadata [Bench column].Each column metadata column corresponds to one benchmark dataset analyzed if input is list() Values can be either 0/1 (not/benchmarked) or 0-n (how many times each gene-enhancer pair is benchmared).

Author(s)

Inga Patarcic

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# Creating testing and benchmarking dataset
require(GenomicRanges)
require(InteractionSet)
   
   interactions <- GInteractions(GRReg1_toy,GRReg1_toy$reg)
   benchInteractions <- GInteractions(GRReg2_toy,GRReg2_toy$reg)
   
benchmarkInteractions(interactions,
            benchInteractions,
            binary=FALSE)
            
benchmarkInteractions(interactions,
            benchInteractions,
            binary=TRUE)             

# add prefilter

benchmarkInteractions(interactions,
            benchInteractions,
            binary=TRUE,
            preFilter=TRUE) 
            
 # forceByName argument           
           
    interactions$name <- interactions$anchor1.name
    benchInteractions$name <- benchInteractions$anchor1.name
 
             benchmarkInteractions(interactions,
                      benchInteractions,
                      binary=TRUE,
                      forceByName = TRUE)
                
##################   
# example for list:

# NOTE: anchor1.Bench1Exp & anchor1.Bench2Exp are expected/precalculated
# values for this benchmark example
 
  benchInteractionsList <- list(benchInteractions,interactions)
  names(benchInteractionsList) <- c("benchInteractions1",
  "benchInteractions2")


interactionsB <- benchmarkInteractions(interactions,
                        benchInteractionsList,
                        ignore.strand=TRUE,
                        binary=FALSE,
                        mc.cores = 1)               
    
 # forceByName = T can work for benchmark lists as well
 
            benchInteractionsList <- list(benchInteractions,
            benchInteractions[1:5])
            names(benchInteractionsList) <- c("BL1","BL2")


  benchmarkInteractions(benchInteractions=benchInteractionsList,
                        interactions = interactions,
                        forceByName = TRUE)
                            
                            
 # Checking what happends when anchor1&anchor2 both overlap only one region
 # in benchmark dataset?  OK, They are not benchmarked...
 
   benchmarkInteractions(interactions[1],
               benchInteractions[1])      

 # WARNING! 
 # However, one need to be careful when benchmarking anchors that overlap 
 # within test set (eg enhancer overlaps gene region), 
 # because these regions will be benchmarked.
   
   benchmarkInteractions(interactions[5],
               benchInteractions)           
                                    

BIMSBbioinfo/reg2gene documentation built on May 3, 2019, 6:42 p.m.