nearest-methods: Finding the nearest genomic tuple/range neighbour
In GenomicTuples: Representation and Manipulation of Genomic Tuples

Description Usage Arguments Details Value Author(s) See Also Examples

The nearest, precede, follow, distance and distanceToNearest methods for GTuples objects and subclasses.

NOTE: These methods treat the tuples as if they were ranges, with ranges given by [pos_{1}, pos_{m}] and where m is the size,GTuples-method of the tuples. This is done via inheritance so that a GTuples object is treated as a GRanges and the appropriate method is dispatched upon.

## S4 method for signature 'GTuples,GTuples'
precede(x, subject, select = c("arbitrary", "all"), 
        ignore.strand = FALSE, ...)
## S4 method for signature 'GTuples,missing'
precede(x, subject, select = c("arbitrary", "all"), 
        ignore.strand = FALSE, ...)

## S4 method for signature 'GTuples,GTuples'
follow(x, subject, select = c("arbitrary", "all"), 
       ignore.strand=FALSE, ...)
## S4 method for signature 'GTuples,missing'
follow(x, subject, select = c("arbitrary", "all"), 
       ignore.strand = FALSE, ...)

## S4 method for signature 'GTuples,GTuples'
nearest(x, subject, select = c("arbitrary", "all"), 
        ignore.strand = FALSE, ...)
## S4 method for signature 'GTuples,missing'
nearest(x, subject, select = c("arbitrary", "all"), 
        ignore.strand = FALSE, ...)

## S4 method for signature 'GTuples,GTuples'
distanceToNearest(x, subject, ignore.strand = FALSE, 
                  ...)
## S4 method for signature 'GTuples,missing'
distanceToNearest(x, subject, ignore.strand = FALSE, 
                  ...)

## S4 method for signature 'GTuples,GTuples'
distance(x, y, ignore.strand = FALSE, ...)

`x`	The query GTuples instance.
`subject`	The subject GTuples instance within which the nearest neighbours are found. Can be missing, in which case `x` is also the subject.
`y`	For the `distance` method, a `GTuples` or `GRanges` instance. Cannot be missing. If `x` and `y` are not the same length, the shortest will be recycled to match the length of the longest.
`select`	Logic for handling ties. By default, all methods select a single tuple/range (arbitrary for `nearest`, the first by order in `subject` for `precede`, and the last for `follow`). When `select = "all"` a `Hits` object is returned with all matches for `x`. If `x` does not have a match in `subject` the `x` is not included in the `Hits` object.
`ignore.strand`	A `logical` indicating if the strand of the input tuples/ranges should be ignored. When `TRUE`, strand is set to `'+'`.
`...`	Additional arguments for methods.

nearest: Performs conventional nearest neighbour finding. Returns an integer vector containing the index of the nearest neighbour tuple/range in subject for each range in x. If there is no nearest neighbour NA is returned. For details of the algorithm see the man page in IRanges, ?nearest.
precede: For each range in x, precede returns the index of the tuple/range in subject that is directly preceded by the tuple/range in x. Overlapping tuples/ranges are excluded. NA is returned when there are no qualifying tuples/ranges in subject.
follow: The opposite of precede, follow returns the index of the tuple/range in subject that is directly followed by the tuple/range in x. Overlapping tuples/ranges are excluded. NA is returned when there are no qualifying tuples/ranges in subject.
Orientation and Strand: The relevant orientation for precede and follow is 5' to 3', consistent with the direction of translation. Because positional numbering along a chromosome is from left to right and transcription takes place from 5' to 3', precede and follow can appear to have ‘opposite’ behaviour on the + and - strand. Using positions 5 and 6 as an example, 5 precedes 6 on the + strand but follows 6 on the - strand.

A tuple/range with strand * can be compared to tuples/ranges on either the + or - strand. Below we outline the priority when tuples/ranges on multiple strands are compared. When ignore.strand=TRUE all tuples/ranges are treated as if on the + strand.
- x on + strand can match to tuples/ranges on both + and * strands. In the case of a tie the first tuple/range by order is chosen.
- x on - strand can match to tuples/ranges on both - and * strands. In the case of a tie the first tuple/range by order is chosen.
- x on * strand can match to tuples/ranges on any of +, - or * strands. In the case of a tie the first tuple/range by order is chosen.
distanceToNearest: Returns the distance for each tuple/range in x to its nearest neighbour in the subject.
distance: Returns the distance for each tuple/range in x to the range in y. The behaviour of distance has changed in Bioconductor 2.12. See the man page ?distance in IRanges for details.

For nearest, precede and follow, an integer vector of indices in subject, or aHits if select = "all".

For distanceToNearest, a Hits object with a column for the query index (from), subject index (to) and the distance between the pair.

For distance, an integer vector of distances between the tuples/ranges in x and y.

Peter Hickey for methods involving GTuples. P. Aboyoun and V. Obenchain <vobencha@fhcrc.org> for all the real work underlying the powerful nearest methods.

The GTuples and GRanges classes.
GenomicRanges and GRanges classes in the GenomicRanges package.
TheIPosRanges class in the IRanges package.
The Hits class in the S4Vectors package.
The nearest-methods man page in the GenomicRanges package.
findOverlaps-methods for finding just the overlapping ranges.

  ## -----------------------------------------------------------
  ## precede() and follow()
  ## -----------------------------------------------------------
  query <- GTuples("A", matrix(c(5L, 20L, 6L, 21L), ncol = 2), strand = "+")
  subject <- GTuples("A", matrix(c(rep(c(10L, 15L), 2), rep(c(11L, 16L), 2)), 
                                 ncol = 2),
                          strand = c("+", "+", "-", "-"))
  precede(query, subject)
  follow(query, subject)
 
  strand(query) <- "-"
  precede(query, subject)
  follow(query, subject)
 
  ## ties choose first in order
  query <- GTuples("A", matrix(c(10L, 11L), ncol = 2), c("+", "-", "*"))
  subject <- GTuples("A", matrix(c(rep(c(5L, 15L), each = 3), 
                                   rep(c(6L, 16L), each = 3)), ncol = 2),
                          rep(c("+", "-", "*"), 2))
  precede(query, subject)
  precede(query, rev(subject))
 
  ## ignore.strand = TRUE treats all ranges as '+'
  precede(query[1], subject[4:6], select="all", ignore.strand = FALSE)
  precede(query[1], subject[4:6], select="all", ignore.strand = TRUE)
  
  ## -----------------------------------------------------------
  ## nearest()
  ## -----------------------------------------------------------
  ## When multiple tuples overlap an "arbitrary" tuple is chosen
  query <- GTuples("A", matrix(c(5L, 15L), ncol = 2))
  subject <- GTuples("A", matrix(c(1L, 15L, 5L, 19L), ncol = 2))
  nearest(query, subject)
 
  ## select = "all" returns all hits
  nearest(query, subject, select = "all")
 
  ## Tuples in 'x' will self-select when 'subject' is present
  query <- GTuples("A", matrix(c(1L, 10L, 6L, 15L), ncol = 2))
  nearest(query, query)
 
  ## Tuples in 'x' will not self-select when 'subject' is missing
  nearest(query)
  
  ## -----------------------------------------------------------
  ## distance(), distanceToNearest()
  ## -----------------------------------------------------------
  ## Adjacent, overlap, separated by 1
  query <- GTuples("A", matrix(c(1L, 2L, 10L, 5L, 8L, 11L), ncol = 2))
  subject <- GTuples("A", matrix(c(6L, 5L, 13L, 10L, 10L, 15L), ncol = 2))
  distance(query, subject)

  ## recycling
  distance(query[1], subject)

  query <- GTuples(c("A", "B"), matrix(c(1L, 5L, 2L, 6L), ncol = 2))
  distanceToNearest(query, subject)

Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, basename, cbind, colnames,
    dirname, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
[1]  1 NA
[1] NA  2
[1] NA  4
[1]  3 NA
[1] 4 2 2
[1] 1 4 1
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           3
  -------
  queryLength: 1 / subjectLength: 3
Hits object with 3 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           2
  [3]         1           3
  -------
  queryLength: 1 / subjectLength: 3
[1] 2
Hits object with 2 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           1
  [2]         1           2
  -------
  queryLength: 1 / subjectLength: 2
[1] 1 2
[1] 2 1
[1] 0 0 1
[1] 0 0 7
Hits object with 1 hit and 1 metadata column:
      queryHits subjectHits |  distance
      <integer>   <integer> | <integer>
  [1]         1           2 |         2
  -------
  queryLength: 2 / subjectLength: 3