nearest-methods: Finding the nearest range/position neighbor

nearest-methodsR Documentation

Finding the nearest range/position neighbor

Description

The nearest(), precede(), follow(), distance() and distanceToNearest() methods for IntegerRanges derivatives (e.g. IRanges objects).

Usage

## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
nearest(x, subject, select=c("arbitrary", "all"))

## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
precede(x, subject, select=c("first", "all"))

## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
follow(x, subject, select=c("last", "all"))

## S4 method for signature 'IntegerRanges,IntegerRanges'
distance(x, y)
## S4 method for signature 'Pairs,missing'
distance(x, y)

## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing'
distanceToNearest(x, subject, select=c("arbitrary", "all"))

Arguments

x

The query IntegerRanges derivative, or (for distance()) a Pairs object containing both the query (first) and subject (second).

subject

The subject IntegerRanges object, within which the nearest neighbors are found. Can be missing, in which case x is also the subject.

select

Logic for handling ties. By default, all the methods select a single interval (arbitrary for nearest,the first by order in subject for precede, and the last for follow). To get all matchings, as a Hits object, use "all".

y

For the distance method, a IntegerRanges derivative. Cannot be missing. If x and y are not the same length, the shortest will be recycled to match the length of the longest.

hits

The hits between x and subject

...

Additional arguments for methods

Details

  • nearest(x, subject, select=c("arbitrary", "all")): The conventional nearest neighbor finder. Returns an integer vector containing the index of the nearest neighbor range in subject for each range in x. If there is no nearest neighbor (if subject is empty), NA's are returned.

    Here is roughly how it proceeds, for a range xi in x:

    1. Find the ranges in subject that overlap xi. If a single range si in subject overlaps xi, si is returned as the nearest neighbor of xi. If there are multiple overlaps, one of the overlapping ranges is chosen arbitrarily.

    2. If no ranges in subject overlap with xi, then the range in subject with the shortest distance from its end to the start xi or its start to the end of xi is returned.

  • precede(x, subject, select=c("first", "all")): For each range in x, precede returns the index of the interval in subject that is directly preceded by the query range. Overlapping ranges are excluded. NA is returned when there are no qualifying ranges in subject.

  • follow(x, subject, select=c("last", "all")): The opposite of precede, this function returns the index of the range in subject that a query range in x directly follows. Overlapping ranges are excluded. NA is returned when there are no qualifying ranges in subject.

  • distance(x, y): Returns the distance for each range in x to the range in y.

    The distance method differs from others documented on this page in that it is symmetric; y cannot be missing. If x and y are not the same length, the shortest will be recycled to match the length of the longest. The select argument is not available for distance because comparisons are made in a pair-wise fashion. The return value is the length of the longest of x and y.

    The distance calculation changed in BioC 2.12 to accommodate zero-width ranges in a consistent and intuitive manner. The new distance can be explained by a block model where a range is represented by a series of blocks of size 1. Blocks are adjacent to each other and there is no gap between them. A visual representation of IRanges(4,7) would be

            +-----+-----+-----+-----+
               4     5     6     7
          

    The distance between two consecutive blocks is 0L (prior to Bioconductor 2.12 it was 1L). The new distance calculation now returns the size of the gap between two ranges.

    This change to distance affects the notion of overlaps in that we no longer say:

    x and y overlap <=> distance(x, y) == 0

    Instead we say

    x and y overlap => distance(x, y) == 0

    or

    x and y overlap or are adjacent <=> distance(x, y) == 0

  • distanceToNearest(x, subject, select=c("arbitrary", "all")): Returns the distance for each range in x to its nearest neighbor in subject.

  • selectNearest(hits, x, subject): Selects the hits that have the minimum distance within those for each query range. Ties are possible and can be broken with breakTies.

Value

For nearest(), precede() and follow(), an integer vector of indices in subject, or a Hits object if select="all".

For distance(), an integer vector of distances between the ranges in x and y.

For distanceToNearest(), a Hits object with a metadata column reporting the distance between the pair. Access the distance metadata column with the mcols() accessor.

For selectNearest(), a Hits object, sorted by query.

Author(s)

M. Lawrence

See Also

  • Hits objects implemented in the S4Vectors package.

  • findOverlaps for finding just the overlapping ranges.

  • The IntegerRanges class.

  • nearest-methods in the GenomicRanges package for the nearest(), precede(), follow(), distance(), and distanceToNearest() methods for GenomicRanges objects.

Examples

## ------------------------------------------
## precede() and follow()
## ------------------------------------------
query <- IRanges(c(1, 3, 9), c(3, 7, 10))
subject <- IRanges(c(3, 2, 10), c(3, 13, 12))

precede(query, subject)     # c(3L, 3L, NA)
precede(IRanges(), subject) # integer()
precede(query, IRanges())   # rep(NA_integer_, 3)
precede(query)              # c(3L, 3L, NA)

follow(query, subject)      # c(NA, NA, 1L)
follow(IRanges(), subject)  # integer()
follow(query, IRanges())    # rep(NA_integer_, 3)
follow(query)               # c(NA, NA, 2L)

## ------------------------------------------
## nearest()
## ------------------------------------------
query <- IRanges(c(1, 3, 9), c(2, 7, 10))
subject <- IRanges(c(3, 5, 12), c(3, 6, 12))

nearest(query, subject) # c(1L, 1L, 3L)
nearest(query)          # c(2L, 1L, 2L)

## ------------------------------------------
## distance()
## ------------------------------------------
## adjacent
distance(IRanges(1,5), IRanges(6,10)) # 0L
## overlap
distance(IRanges(1,5), IRanges(3,7))  # 0L
## zero-width
sapply(-3:3, function(i) distance(shift(IRanges(4,3), i), IRanges(4,3)))

Bioconductor/IRanges documentation built on May 5, 2024, 3:25 a.m.