findOverlaps-methods: Finding overlapping genomic ranges

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Various methods for finding/counting overlaps between objects containing genomic ranges. This man page describes the methods that operate on GenomicRanges and GRangesList objects.

NOTE: The findOverlaps generic function and methods for IntegerRanges and IntegerRangesList objects are defined and documented in the IRanges package. The methods for GAlignments, GAlignmentPairs, and GAlignmentsList objects are defined and documented in the GenomicAlignments package.

GenomicRanges and GRangesList objects also support countOverlaps, overlapsAny, and subsetByOverlaps thanks to the default methods defined in the IRanges package and to the findOverlaps and countOverlaps methods defined in this package and documented below.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## S4 method for signature 'GenomicRanges,GenomicRanges'
findOverlaps(query, subject,
    maxgap=-1L, minoverlap=0L,
    type=c("any", "start", "end", "within", "equal"),
    select=c("all", "first", "last", "arbitrary"),
    ignore.strand=FALSE)

## S4 method for signature 'GenomicRanges,GenomicRanges'
countOverlaps(query, subject,
    maxgap=-1L, minoverlap=0L,
    type=c("any", "start", "end", "within", "equal"),
    ignore.strand=FALSE)

Arguments

query, subject

A GRanges or GRangesList object.

maxgap, minoverlap, type

See ?findOverlaps in the IRanges package for a description of these arguments.

select

When select is "all" (the default), the results are returned as a Hits object. Otherwise the returned value is an integer vector parallel to query (i.e. same length) containing the first, last, or arbitrary overlapping interval in subject, with NA indicating intervals that did not overlap any intervals in subject.

ignore.strand

When set to TRUE, the strand information is ignored in the overlap calculations.

Details

When the query and the subject are GRanges or GRangesList objects, findOverlaps uses the triplet (sequence name, range, strand) to determine which features (see paragraph below for the definition of feature) from the query overlap which features in the subject, where a strand value of "*" is treated as occurring on both the "+" and "-" strand. An overlap is recorded when a feature in the query and a feature in the subject have the same sequence name, have a compatible pairing of strands (e.g. "+"/"+", "-"/"-", "*"/"+", "*"/"-", etc.), and satisfy the interval overlap requirements.

In the context of findOverlaps, a feature is a collection of ranges that are treated as a single entity. For GRanges objects, a feature is a single range; while for GRangesList objects, a feature is a list element containing a set of ranges. In the results, the features are referred to by number, which run from 1 to length(query)/length(subject).

For type="equal" with GRangesList objects, query[[i]] matches subject[[j]] iff for each range in query[[i]] there is an identical range in subject[[j]], and vice versa.

Value

For findOverlaps either a Hits object when select="all" or an integer vector otherwise.

For countOverlaps an integer vector containing the tabulated query overlap hits.

Author(s)

P. Aboyoun, S. Falcon, M. Lawrence, and H. Pag<c3><a8>s

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
## ---------------------------------------------------------------------
## BASIC EXAMPLES
## ---------------------------------------------------------------------

## GRanges object:
gr <- GRanges(
        seqnames=Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
        ranges=IRanges(1:10, width=10:1, names=head(letters,10)),
        strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
        score=1:10,
        GC=seq(1, 0, length=10)
      )
gr

## GRangesList object:
gr1 <- GRanges(seqnames="chr2", ranges=IRanges(4:3, 6),
               strand="+", score=5:4, GC=0.45)
gr2 <- GRanges(seqnames=c("chr1", "chr1"),
               ranges=IRanges(c(7,13), width=3),
               strand=c("+", "-"), score=3:4, GC=c(0.3, 0.5))
gr3 <- GRanges(seqnames=c("chr1", "chr2"),
               ranges=IRanges(c(1, 4), c(3, 9)),
               strand=c("-", "-"), score=c(6L, 2L), GC=c(0.4, 0.1))
grl <- GRangesList("gr1"=gr1, "gr2"=gr2, "gr3"=gr3)

## Overlapping two GRanges objects:
table(!is.na(findOverlaps(gr, gr1, select="arbitrary")))
countOverlaps(gr, gr1)
findOverlaps(gr, gr1)
subsetByOverlaps(gr, gr1)

countOverlaps(gr, gr1, type="start")
findOverlaps(gr, gr1, type="start")
subsetByOverlaps(gr, gr1, type="start")

findOverlaps(gr, gr1, select="first")
findOverlaps(gr, gr1, select="last")

findOverlaps(gr1, gr)
findOverlaps(gr1, gr, type="start")
findOverlaps(gr1, gr, type="within")
findOverlaps(gr1, gr, type="equal")

## ---------------------------------------------------------------------
## MORE EXAMPLES
## ---------------------------------------------------------------------

table(!is.na(findOverlaps(gr, gr1, select="arbitrary")))
countOverlaps(gr, gr1)
findOverlaps(gr, gr1)
subsetByOverlaps(gr, gr1)

## Overlaps between a GRanges and a GRangesList object:

table(!is.na(findOverlaps(grl, gr, select="first")))
countOverlaps(grl, gr)
findOverlaps(grl, gr)
subsetByOverlaps(grl, gr)
countOverlaps(grl, gr, type="start")
findOverlaps(grl, gr, type="start")
subsetByOverlaps(grl, gr, type="start")
findOverlaps(grl, gr, select="first")

table(!is.na(findOverlaps(grl, gr1, select="first")))
countOverlaps(grl, gr1)
findOverlaps(grl, gr1)
subsetByOverlaps(grl, gr1)
countOverlaps(grl, gr1, type="start")
findOverlaps(grl, gr1, type="start")
subsetByOverlaps(grl, gr1, type="start")
findOverlaps(grl, gr1, select="first")

## Overlaps between two GRangesList objects:
countOverlaps(grl, rev(grl))
findOverlaps(grl, rev(grl))
subsetByOverlaps(grl, rev(grl))

Example output

Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
GRanges object with 10 ranges and 2 metadata columns:
    seqnames    ranges strand |     score                GC
       <Rle> <IRanges>  <Rle> | <integer>         <numeric>
  a     chr1      1-10      - |         1                 1
  b     chr2      2-10      + |         2 0.888888888888889
  c     chr2      3-10      + |         3 0.777777777777778
  d     chr2      4-10      * |         4 0.666666666666667
  e     chr1      5-10      * |         5 0.555555555555556
  f     chr1      6-10      + |         6 0.444444444444444
  g     chr3      7-10      + |         7 0.333333333333333
  h     chr3      8-10      + |         8 0.222222222222222
  i     chr3      9-10      - |         9 0.111111111111111
  j     chr3        10      - |        10                 0
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

FALSE  TRUE 
    7     3 
a b c d e f g h i j 
0 2 2 2 0 0 0 0 0 0 
Hits object with 6 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         2           2
  [2]         2           1
  [3]         3           2
  [4]         3           1
  [5]         4           2
  [6]         4           1
  -------
  queryLength: 10 / subjectLength: 2
GRanges object with 3 ranges and 2 metadata columns:
    seqnames    ranges strand |     score                GC
       <Rle> <IRanges>  <Rle> | <integer>         <numeric>
  b     chr2      2-10      + |         2 0.888888888888889
  c     chr2      3-10      + |         3 0.777777777777778
  d     chr2      4-10      * |         4 0.666666666666667
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
a b c d e f g h i j 
0 0 1 1 0 0 0 0 0 0 
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         3           2
  [2]         4           1
  -------
  queryLength: 10 / subjectLength: 2
GRanges object with 2 ranges and 2 metadata columns:
    seqnames    ranges strand |     score                GC
       <Rle> <IRanges>  <Rle> | <integer>         <numeric>
  c     chr2      3-10      + |         3 0.777777777777778
  d     chr2      4-10      * |         4 0.666666666666667
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
 [1] NA  1  1  1 NA NA NA NA NA NA
 [1] NA  2  2  2 NA NA NA NA NA NA
Hits object with 6 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           2
  [2]         1           3
  [3]         1           4
  [4]         2           2
  [5]         2           3
  [6]         2           4
  -------
  queryLength: 2 / subjectLength: 10
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           4
  [2]         2           3
  -------
  queryLength: 2 / subjectLength: 10
Hits object with 5 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           2
  [2]         1           3
  [3]         1           4
  [4]         2           2
  [5]         2           3
  -------
  queryLength: 2 / subjectLength: 10
Hits object with 0 hits and 0 metadata columns:
   queryHits subjectHits
   <integer>   <integer>
  -------
  queryLength: 2 / subjectLength: 10

FALSE  TRUE 
    7     3 
a b c d e f g h i j 
0 2 2 2 0 0 0 0 0 0 
Hits object with 6 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         2           2
  [2]         2           1
  [3]         3           2
  [4]         3           1
  [5]         4           2
  [6]         4           1
  -------
  queryLength: 10 / subjectLength: 2
GRanges object with 3 ranges and 2 metadata columns:
    seqnames    ranges strand |     score                GC
       <Rle> <IRanges>  <Rle> | <integer>         <numeric>
  b     chr2      2-10      + |         2 0.888888888888889
  c     chr2      3-10      + |         3 0.777777777777778
  d     chr2      4-10      * |         4 0.666666666666667
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths

TRUE 
   3 
gr1 gr2 gr3 
  3   2   2 
Hits object with 7 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           2
  [2]         1           3
  [3]         1           4
  [4]         2           5
  [5]         2           6
  [6]         3           1
  [7]         3           4
  -------
  queryLength: 3 / subjectLength: 10
GRangesList object of length 3:
$gr1 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr2       4-6      + |         5      0.45
  [2]     chr2       3-6      + |         4      0.45

$gr2 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames ranges strand | score  GC
  [1]     chr1    7-9      + |     3 0.3
  [2]     chr1  13-15      - |     4 0.5

$gr3 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames ranges strand | score  GC
  [1]     chr1    1-3      - |     6 0.4
  [2]     chr2    4-9      - |     2 0.1

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
gr1 gr2 gr3 
  2   0   2 
Hits object with 4 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           4
  [2]         1           3
  [3]         3           1
  [4]         3           4
  -------
  queryLength: 3 / subjectLength: 10
GRangesList object of length 2:
$gr1 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr2       4-6      + |         5      0.45
  [2]     chr2       3-6      + |         4      0.45

$gr3 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames ranges strand | score  GC
  [1]     chr1    1-3      - |     6 0.4
  [2]     chr2    4-9      - |     2 0.1

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
[1] 2 5 1

FALSE  TRUE 
    2     1 
gr1 gr2 gr3 
  2   0   0 
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           2
  [2]         1           1
  -------
  queryLength: 3 / subjectLength: 2
GRangesList object of length 1:
$gr1 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr2       4-6      + |         5      0.45
  [2]     chr2       3-6      + |         4      0.45

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
gr1 gr2 gr3 
  2   0   0 
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           2
  -------
  queryLength: 3 / subjectLength: 2
GRangesList object of length 1:
$gr1 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr2       4-6      + |         5      0.45
  [2]     chr2       3-6      + |         4      0.45

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
[1]  1 NA NA
gr1 gr2 gr3 
  1   1   1 
Hits object with 3 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           3
  [2]         2           2
  [3]         3           1
  -------
  queryLength: 3 / subjectLength: 3
GRangesList object of length 3:
$gr1 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames    ranges strand |     score        GC
         <Rle> <IRanges>  <Rle> | <integer> <numeric>
  [1]     chr2       4-6      + |         5      0.45
  [2]     chr2       3-6      + |         4      0.45

$gr2 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames ranges strand | score  GC
  [1]     chr1    7-9      + |     3 0.3
  [2]     chr1  13-15      - |     4 0.5

$gr3 
GRanges object with 2 ranges and 2 metadata columns:
      seqnames ranges strand | score  GC
  [1]     chr1    1-3      - |     6 0.4
  [2]     chr2    4-9      - |     2 0.1

-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths

GenomicRanges documentation built on Nov. 8, 2020, 5:46 p.m.