Detecting all neighbors within range In BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

```require(knitr)
opts_chunk\$set(error=FALSE, message=FALSE, warning=FALSE)
library(BiocNeighbors)
```

Identifying all neighbors within range

Another application of the KMKNN or VP tree algorithms is to identify all neighboring points within a certain distance^[The default here is Euclidean, but again, we can set `distance="Manhattan"` in the `BNPARAM` object if so desired.] of the current point. We first mock up some data:

```nobs <- 10000
ndim <- 20
data <- matrix(runif(nobs*ndim), ncol=ndim)
```

We apply the `findNeighbors()` function to `data`:

```fout <- findNeighbors(data, threshold=1)
```

Each entry of the `index` list corresponds to a point in `data` and contains the row indices in `data` that are within `threshold`. For example, the 3rd point in `data` has the following neighbors:

```fout\$index[[3]]
```

... with the following distances to those neighbors:

```fout\$distance[[3]]
```

Note that, for this function, the reported neighbors are not sorted by distance. The order of the output is completely arbitrary and will vary depending on the random seed. However, the identity of the neighbors is fully deterministic.

Querying another data set for neighbors

The `queryNeighbors()` function is also provided for identifying all points within a certain distance of a query point. Given a query data set:

```nquery <- 1000
ndim <- 20
query <- matrix(runif(nquery*ndim), ncol=ndim)
```

... we apply the `queryNeighbors()` function:

```qout <- queryNeighbors(data, query, threshold=1)
length(qout\$index)
```

... where each entry of `qout\$index` corresponds to a row of `query` and contains its neighbors in `data`. Again, the order of the output is arbitrary but the identity of the neighbors is deterministic.

Further options

Most of the options described for `findKNN()` are also applicable here. For example:

• `subset` to identify neighbors for a subset of points.
• `get.distance` to avoid retrieving distances when unnecessary.
• `BPPARAM` to parallelize the calculations across multiple workers.
• `raw.index` to return the raw indices from a precomputed index.

Note that the argument for a precomputed index is `precomputed`:

```pre <- buildIndex(data, BNPARAM=KmknnParam())
fout.pre <- findNeighbors(BNINDEX=pre, threshold=1)
qout.pre <- queryNeighbors(BNINDEX=pre, query=query, threshold=1)
```

Users are referred to the documentation of each function for specific details.

Session information

```sessionInfo()
```

Try the BiocNeighbors package in your browser

Any scripts or data that you put into this service are public.

BiocNeighbors documentation built on Dec. 9, 2020, 2:01 a.m.