Finding overlapping genomic tuples

Description

Finds tuple overlaps between a GTuples or GTuplesList object, and another object containing tuples or ranges.

NOTE: The findOverlaps generic function and methods for Ranges and RangesList objects are defined and documented in the IRanges package. The methods for GenomicRanges and GRangesList objects are defined and documented in the GenomicRanges package. The methods for GAlignments, GAlignmentPairs, and GAlignmentsList objects are defined and documented in the GenomicAlignments package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## S4 method for signature 'GTuples,GTuples'
findOverlaps(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"),
    select = c("all", "first", "last", "arbitrary"),
    ignore.strand = FALSE)

## S4 method for signature 'GTuples,GTuples'
countOverlaps(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"),
    ignore.strand = FALSE)

## S4 method for signature 'GTuples,GTuples'
overlapsAny(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"),
    ignore.strand = FALSE)

## S4 method for signature 'GTuples,GTuples'
subsetByOverlaps(query, subject,
    maxgap = 0L, minoverlap = 1L,
    type = c("any", "start", "end", "within", "equal"),
    ignore.strand = FALSE)

Arguments

query, subject

A GTuples or GTuplesList object. GRanges, GRangesList, RangesList and RangedData are also accepted for one of query or subject.

type

See details below.

maxgap, minoverlap

See findOverlaps in the IRanges package for a description of these arguments. These arguments have no effect if both query and subject are GTuples objects and type = "equal".

select

See findOverlaps in the IRanges package for a description of this argument.

ignore.strand

When set to TRUE, the strand information is ignored in the overlap calculations.

Details

The findOverlaps-based methods involving genomic tuples, either through GTuples or GTuplesList objects, can search for tuple-tuple, tuple-range and range-tuple overlaps. Each of these are described below, with attention paid to the important special case of finding "equal tuple-tuple overlaps".

Equal tuple-tuple overlaps

When the query and the subject are both GTuples objects and type = "equal", findOverlaps uses the seqnames (seqnames), positions (tuples,GTuples-method) and strand (strand) to determine which tuples from the query exactly match those in the subject, where a strand value of "*" is treated as occurring on both the "+" and "-" strand. An overlap is recorded when a tuple in the query and a tuple in the subject have the same sequence name, have a compatible pairing of strands (e.g. "+"/"+", "-"/"-", "*"/"+", "*"/"-", etc.), and have identical positions.

NOTE: Equal tuple-tuple overlaps can only be computed if size(query) is equal to size(subject).

Other tuple-tuple overlaps

When the query and the subject are GTuples or GTuplesList objects and type = "any", "start", "end" or "within", findOverlaps treats the tuples as if they were ranges, with ranges given by [pos_{1}, pos_{m}] and where m is the size,GTuples-method of the tuples. This is done via inheritance so that a GTuples (resp. GTuplesList) object is treated as a GRanges (resp. GRangesList) and the appropriate findOverlaps method is dispatched upon.

NOTE: This is the only type of overlap finding available when either the query and subject are GTuplesList objects. This is following the behaviour of findOverlaps,GRangesList,GRangesList-method that allows type = "any", "start", "end" or "within" but does not allow type = "equal".

tuple-range and range-tuple overlaps

When one of the query and the subject is not a GTuples or GTuplesList objects, findOverlaps treats the tuples as if they were ranges, with ranges given by [pos_{1}, pos_{m}] and where m is the size,GTuples-method of the tuples. This is done via inheritance so that a GTuples (resp. GTuplesList) object is treated as a GRanges (resp. GRangesList) and the appropriate findOverlaps method is dispatched upon.

In the context of findOverlaps, a feature is a collection of tuples/ranges that are treated as a single entity. For GTuples objects, a feature is a single tuple; while for GTuplesList objects, a feature is a list element containing a set of tuples. In the results, the features are referred to by number, which run from 1 to length(query)/length(subject).

Value

For findOverlaps either a Hits object when select = "all" or an integer vector otherwise.

For countOverlaps an integer vector containing the tabulated query overlap hits.

For overlapsAny a logical vector of length equal to the number of tuples/ranges in query indicating those that overlap any of the tuples/ranges in subject.

For subsetByOverlaps an object of the same class as query containing the subset that overlapped at least one entity in subject.

For RangedData and RangesList, with the exception of subsetByOverlaps, the results align to the unlisted form of the object. This turns out to be fairly convenient for RangedData (not so much for RangesList, but something has to give).

Author(s)

Peter Hickey for methods involving GTuples and GTuplesList. P. Aboyoun, S. Falcon, M. Lawrence, N. Gopalakrishnan, H. Pages and H. Corrada Bravo for all the real work underlying the powerful findOverlaps functionality.

See Also

  • Please see the package vignette for an extended discussion of overlaps involving genomic tuples, which is available by typing vignette(topic = 'GenomicTuplesIntroduction', package = 'GenomicTuples') at the R prompt.

  • findOverlaps

  • findOverlaps

  • Hits-class

  • GTuples-class

  • GTuplesList-class

  • GRanges-class

  • GRangesList-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
## GTuples object containing 3-tuples:
gt3 <- GTuples(seqnames = c('chr1', 'chr1', 'chr1', 'chr1', 'chr2'), 
               tuples = matrix(c(10L, 10L, 10L, 10L, 10L, 20L, 20L, 20L, 25L, 
                                 20L, 30L, 30L, 35L, 30L, 30L), ncol = 3), 
               strand = c('+', '-', '*', '+', '+'))
               
## GTuplesList object
gtl3 <- GTuplesList(A = gt3[1:3], B = gt3[4:5])
               
## Find equal genomic tuples:
findOverlaps(gt3, gt3, type = 'equal')
## Note that this is different to the results if the tuples are treated as 
## ranges since this ignores the "internal positions" (pos2):
findOverlaps(granges(gt3), granges(gt3), type = 'equal')
               
## Scenarios where tuples are treated as ranges:
findOverlaps(gt3, gt3, type = 'any')
findOverlaps(gt3, gt3, type = 'start')
findOverlaps(gt3, gt3, type = 'end')
findOverlaps(gt3, gt3, type = 'within')

## Overlapping a GTuples and a GTuplesList object (tuples treated as ranges):
table(!is.na(findOverlaps(gtl3, gt3, select="first")))
countOverlaps(gtl3, gt3)
findOverlaps(gtl3, gt3)
subsetByOverlaps(gtl3, gt3)
countOverlaps(gtl3, gt3, type = "start")
findOverlaps(gtl3, gt3, type = "start")
subsetByOverlaps(gtl3, gt3, type = "start")
findOverlaps(gtl3, gt3, select = "first")