queryNeighbors-functions: Query neighbors in range

queryNeighbors-functionsR Documentation

Query neighbors in range

Description

Find all neighboring data points within a certain distance of a query point.

Usage

rangeQueryExhaustive(
  X,
  query,
  threshold,
  get.index = TRUE,
  get.distance = TRUE,
  BPPARAM = SerialParam(),
  precomputed = NULL,
  transposed = FALSE,
  subset = NULL,
  raw.index = FALSE,
  ...
)

rangeQueryKmknn(
  X,
  query,
  threshold,
  get.index = TRUE,
  get.distance = TRUE,
  BPPARAM = SerialParam(),
  precomputed = NULL,
  transposed = FALSE,
  subset = NULL,
  raw.index = FALSE,
  ...
)

rangeQueryVptree(
  X,
  query,
  threshold,
  get.index = TRUE,
  get.distance = TRUE,
  BPPARAM = SerialParam(),
  precomputed = NULL,
  transposed = FALSE,
  subset = NULL,
  raw.index = FALSE,
  ...
)

Arguments

X

A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).

query

A numeric matrix of query points, containing different data points in the rows but the same number and ordering of dimensions in the columns.

threshold

A positive numeric scalar specifying the maximum distance at which a point is considered a neighbor. Alternatively, a vector containing a different distance threshold for each query point.

get.index

A logical scalar indicating whether the indices of the neighbors should be recorded.

get.distance

A logical scalar indicating whether distances to the neighbors should be recorded.

BPPARAM

A BiocParallelParam object indicating how the search should be parallelized.

precomputed

A BiocNeighborIndex object of the appropriate class, generated from X. For rangeFindExhaustive, this should be a ExhaustiveIndex from rangeFindExhaustive. For rangeFindKmknn, this should be a KmknnIndex from rangeFindKmknn. For rangeFindVptree, this should be a VptreeIndex from rangeFindVptree.

transposed

A logical scalar indicating whether the query is transposed, in which case query is assumed to contain dimensions in the rows and data points in the columns.

subset

A vector indicating the rows of query (or columns, if transposed=TRUE) for which the neighbors should be identified.

raw.index

A logial scalar indicating whether raw column indices should be returned, see ?"BiocNeighbors-raw-index".

...

Further arguments to pass to the respective build* function for each algorithm. This includes distance, a string specifying whether "Euclidean", "Manhattan" or "Cosine" distances are to be used.

Details

This function identifies points in X that are neighbors (i.e., within a distance threshold) of each point in query. The exact implementation can either use the KMKNNN approach or a VP tree. This requires both X and query to have the same number of variables.

By default, neighbors are identified for all data points within query. If subset is specified, neighbors are only detected for the query points in the subset. This yields the same result as (but is more efficient than) subsetting the output matrices after running queryNeighbors on the full query.

If threshold is a vector, each entry is assumed to specify a (possibly different) threshold for each point in query. If subset is also specified, each entry is assumed to specify a threshold for each point in subset. An error will be raised if threshold is a vector of incorrect length.

Turning off get.index or get.distance will provide a slight speed boost and reduce memory usage when those returned values are not of interest. If both get.index=FALSE and get.distance=FALSE, an integer vector containing the number of neighbors to each point is returned instead, which is more memory efficient when the identities of/distances to the neighbors are not required.

If transposed=TRUE, this function assumes that query is already transposed, which saves a bit of time by avoiding an unnecessary transposition. Using BPPARAM will also split the search by query points across multiple processes.

If multiple queries are to be performed to the same X, it may be beneficial to build the index from X (e.g., with buildKmknn). The resulting BiocNeighborIndex object can be supplied as precomputed to multiple function calls, avoiding the need to repeat index construction in each call. Note that when precomputed is supplied, the value of X is ignored.

Value

A list is returned containing:

  • index, if get.index=TRUE. This is a list of integer vectors where each entry corresponds to a point (denoted here as i) in query. The vector for i contains the set of row indices of all points in X that lie within threshold of point i. Points in each vector are not ordered, and i will always be included in its own set.

  • distance, if get.distance=TRUE. This is a list of numeric vectors where each entry corresponds to a point (as above) and contains the distances of the neighbors from i. Elements of each vector in distance match to elements of the corresponding vector in index.

If get.index=FALSE and get.distance=FALSE, an integer vector is returned instead containing the number of neighbors to i.

If subset is not NULL, each entry of the above lists refers to a point in the subset, in the same order as supplied in subset.

See ?"BiocNeighbors-raw-index" for an explanation of the output when raw.index=TRUE.

Author(s)

Aaron Lun

See Also

buildKmknn or buildVptree to build an index ahead of time.

See ?"BiocNeighbors-algorithms" for an overview of the available algorithms.

Examples

Y <- matrix(rnorm(100000), ncol=20)
Z <- matrix(rnorm(20000), ncol=20)

out <- rangeQueryKmknn(Y, query=Z, threshold=1)
head(out$index)
head(out$distance)

out2 <- rangeQueryVptree(Y, query=Z, threshold=1)
head(out2$index)
head(out2$distance)

out3 <- rangeQueryExhaustive(Y, query=Z, threshold=1)
head(out3$index)
head(out3$distance)


LTLA/BiocNeighbors documentation built on Jan. 14, 2024, 9:46 p.m.