# nearest-methods: Finding the nearest range/position neighbor In IRanges: Foundation of integer range manipulation in Bioconductor

## Description

The `nearest`, `precede`, `follow`, `distance` and `distanceToNearest` methods for `IntegerRanges` objects and subclasses.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing' nearest(x, subject, select = c("arbitrary", "all")) ## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing' precede(x, subject, select = c("first", "all")) ## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing' follow(x, subject, select = c("last", "all")) ## S4 method for signature 'IntegerRanges,IntegerRanges_OR_missing' distanceToNearest(x, subject, select = c("arbitrary", "all")) ## S4 method for signature 'IntegerRanges,IntegerRanges' distance(x, y) ## S4 method for signature 'Pairs,missing' distance(x, y) ```

## Arguments

 `x` The query `IntegerRanges` object, or (for `distance()`) a `Pairs` containing both the query (first) and subject (second). `subject` The subject `IntegerRanges` object, within which the nearest neighbors are found. Can be missing, in which case `x` is also the subject. `select` Logic for handling ties. By default, all the methods select a single interval (arbitrary for `nearest`,the first by order in `subject` for `precede`, and the last for `follow`). To get all matchings, as a `Hits` object, use “all”. `y` For the `distance` method, a `IntegerRanges` object. Cannot be missing. If `x` and `y` are not the same length, the shortest will be recycled to match the length of the longest. `hits` The hits between `x` and `subject` `...` Additional arguments for methods

## Details

• nearest: The conventional nearest neighbor finder. Returns an integer vector containing the index of the nearest neighbor range in `subject` for each range in `x`. If there is no nearest neighbor (if `subject` is empty), NA's are returned.

Here is roughly how it proceeds, for a range `xi` in `x`:

1. Find the ranges in `subject` that overlap `xi`. If a single range `si` in `subject` overlaps `xi`, `si` is returned as the nearest neighbor of `xi`. If there are multiple overlaps, one of the overlapping ranges is chosen arbitrarily.

2. If no ranges in `subject` overlap with `xi`, then the range in `subject` with the shortest distance from its end to the start `xi` or its start to the end of `xi` is returned.

• precede: For each range in `x`, `precede` returns the index of the interval in `subject` that is directly preceded by the query range. Overlapping ranges are excluded. `NA` is returned when there are no qualifying ranges in `subject`.

• follow: The opposite of `precede`, this function returns the index of the range in `subject` that a query range in `x` directly follows. Overlapping ranges are excluded. `NA` is returned when there are no qualifying ranges in `subject`.

• distanceToNearest: Returns the distance for each range in `x` to its nearest neighbor in `subject`.

• distance: Returns the distance for each range in `x` to the range in `y`.

The `distance` method differs from others documented on this page in that it is symmetric; `y` cannot be missing. If `x` and `y` are not the same length, the shortest will be recycled to match the length of the longest. The `select` argument is not available for `distance` because comparisons are made in a pair-wise fashion. The return value is the length of the longest of `x` and `y`.

The `distance` calculation changed in BioC 2.12 to accommodate zero-width ranges in a consistent and intuitive manner. The new distance can be explained by a block model where a range is represented by a series of blocks of size 1. Blocks are adjacent to each other and there is no gap between them. A visual representation of `IRanges(4,7)` would be

```        +-----+-----+-----+-----+
4     5     6     7
```

The distance between two consecutive blocks is 0L (prior to Bioconductor 2.12 it was 1L). The new distance calculation now returns the size of the gap between two ranges.

This change to distance affects the notion of overlaps in that we no longer say:

x and y overlap <=> distance(x, y) == 0

Instead we say

x and y overlap => distance(x, y) == 0

or

x and y overlap or are adjacent <=> distance(x, y) == 0

• selectNearest: Selects the hits that have the minimum distance within those for each query range. Ties are possible and can be broken with `breakTies`.

## Value

For `nearest`, `precede` and `follow`, an integer vector of indices in `subject`, or a `Hits` if `select="all"`.

For `distanceToNearest`, a `Hits` object with an elementMetadata column of the `distance` between the pair. Access `distance` with `mcols` accessor.

For `distance`, an integer vector of distances between the ranges in `x` and `y`.

For `selectNearest`, a `Hits` object, sorted by query.

M. Lawrence

## See Also

• The IntegerRanges and Hits classes.

• The GenomicRanges and GRanges classes in the GenomicRanges package.

• `findOverlaps` for finding just the overlapping ranges.

• GenomicRanges methods for

• `precede`

• `follow`

• `nearest`

• `distance`

• `distanceToNearest`

are documented at ?`nearest-methods` or ?`precede,GenomicRanges,GenomicRanges-method`

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34``` ``` ## ------------------------------------------ ## precede() and follow() ## ------------------------------------------ query <- IRanges(c(1, 3, 9), c(3, 7, 10)) subject <- IRanges(c(3, 2, 10), c(3, 13, 12)) precede(query, subject) # c(3L, 3L, NA) precede(IRanges(), subject) # integer() precede(query, IRanges()) # rep(NA_integer_, 3) precede(query) # c(3L, 3L, NA) follow(query, subject) # c(NA, NA, 1L) follow(IRanges(), subject) # integer() follow(query, IRanges()) # rep(NA_integer_, 3) follow(query) # c(NA, NA, 2L) ## ------------------------------------------ ## nearest() ## ------------------------------------------ query <- IRanges(c(1, 3, 9), c(2, 7, 10)) subject <- IRanges(c(3, 5, 12), c(3, 6, 12)) nearest(query, subject) # c(1L, 1L, 3L) nearest(query) # c(2L, 1L, 2L) ## ------------------------------------------ ## distance() ## ------------------------------------------ ## adjacent distance(IRanges(1,5), IRanges(6,10)) # 0L ## overlap distance(IRanges(1,5), IRanges(3,7)) # 0L ## zero-width sapply(-3:3, function(i) distance(shift(IRanges(4,3), i), IRanges(4,3))) ```

IRanges documentation built on Dec. 14, 2020, 2 a.m.