queryNeighbors: Query neighbors within a threshold distance

queryNeighborsR Documentation

Query neighbors within a threshold distance

Description

Find all points in a reference dataset that lie within a threshold distance of each point in a query dataset.

Usage

queryNeighbors(
  X,
  query,
  threshold,
  get.index = TRUE,
  get.distance = TRUE,
  num.threads = 1,
  subset = NULL,
  transposed = FALSE,
  ...,
  BNPARAM = NULL
)

Arguments

X

The reference dataset to be queried. This should be a numeric matrix where rows correspond to reference points and columns correspond to variables (i.e., dimensions). Alternatively, a prebuilt BiocNeighborIndex object from buildIndex.

query

A numeric matrix of query points, containing the same number of columns as X.

threshold

A positive numeric scalar specifying the maximum distance at which a point is considered a neighbor. Alternatively, a vector containing a different distance threshold for each query point.

get.index

A logical scalar indicating whether the indices of the neighbors should be recorded.

get.distance

A logical scalar indicating whether distances to the neighbors should be recorded.

num.threads

Integer scalar specifying the number of threads to use for the search.

subset

An integer, logical or character vector indicating the rows of query (or columns, if transposed=TRUE) for which the nearest neighbors should be identified.

transposed

A logical scalar indicating whether X and query are transposed, in which case both matrices are assumed to contain dimensions in the rows and data points in the columns.

...

Further arguments to pass to buildIndex when X is not an external pointer.

BNPARAM

A BiocNeighborParam object specifying how the index should be constructed. If NULL, this defaults to a KmknnParam. Ignored if x contains a prebuilt index.

Details

This function identifies all points in X that within threshold of each point in query. For Euclidean distances, this is equivalent to identifying all points in a hypersphere centered around the point of interest. Not all implementations support this search mode, but we can use KmknnParam and VptreeParam.

If threshold is a vector, each entry is assumed to specify a (possibly different) threshold for each point in query. If subset is also specified, each entry is assumed to specify a threshold for each point in subset. An error will be raised if threshold is a vector of incorrect length.

If multiple queries are to be performed to the same X, it may be beneficial to build the index from X with buildIndex. The resulting pointer object can be supplied as X to multiple queryKNN calls, avoiding the need to repeat index construction in each call.

Value

A list is returned containing:

  • index, if get.index=TRUE. This is a list of integer vectors where each entry corresponds to a point (denoted here as i) in query. The vector for i contains the set of row indices of all points in X that lie within threshold of point i. Neighbors for i are sorted by increasing distance from i.

  • distance, if get.distance=TRUE. This is a list of numeric vectors where each entry corresponds to a point (as above) and contains the distances of the neighbors from i. Elements of each vector in distance match to elements of the corresponding vector in index.

If both get.index=FALSE and get.distance=FALSE, an integer vector is returned of length equal to the number of observations. The i-th entry contains the number of neighbors of i within threshold.

If subset is not NULL, each entry of the above vector/lists refers to a point in the subset, in the same order as supplied in subset.

Author(s)

Aaron Lun

See Also

buildIndex, to build an index ahead of time.

Examples

Y <- matrix(rnorm(100000), ncol=20)
Z <- matrix(rnorm(20000), ncol=20)
out <- queryNeighbors(Y, query=Z, threshold=3)
summary(lengths(out$index))


LTLA/kmknn documentation built on Oct. 18, 2024, 9:01 p.m.