queryKNN: Query k-nearest neighbors

queryKNNR Documentation

Query k-nearest neighbors

Description

Query a reference dataset for the k-nearest neighbors of each point in a query dataset.

Usage

queryKNN(
  X,
  query,
  k,
  get.index = TRUE,
  get.distance = TRUE,
  num.threads = 1,
  subset = NULL,
  transposed = FALSE,
  ...,
  BNPARAM = NULL
)

Arguments

X

The reference dataset to be queried. This should be a numeric matrix where rows correspond to reference points and columns correspond to variables (i.e., dimensions). Alternatively, a prebuilt BiocNeighborIndex object from buildIndex.

query

A numeric matrix of query points, containing the same number of columns as X.

k

A positive integer scalar specifying the number of nearest neighbors to retrieve.

Alternatively, an integer vector of length equal to the number of points in query, specifying the number of neighbors to identify for each point. If subset is provided, this should have length equal to the length of subset. Users should wrap this vector in an AsIs class to distinguish length-1 vectors from integer scalars.

All k should be less than or equal to the number of points in X, otherwise the former will be capped at the latter with a warning.

get.index

A logical scalar indicating whether the indices of the nearest neighbors should be recorded. Setting this to FALSE improves efficiency if the indices are not of interest.

Alternatively, if k is an integer scalar, this may be a string containing "normal" or "transposed". The former is the same as TRUE, while the latter returns the index matrix in transposed format.

get.distance

A logical scalar indicating whether distances to the nearest neighbors should be recorded. Setting this to FALSE improves efficiency if the distances are not of interest.

Alternatively, if k is an integer scalar, this may be a string containing "normal" or "transposed". The former is the same as TRUE, while the latter returns the distance matrix in transposed format.

num.threads

Integer scalar specifying the number of threads to use for the search.

subset

An integer, logical or character vector indicating the rows of query (or columns, if transposed=TRUE) for which the nearest neighbors should be identified.

transposed

A logical scalar indicating whether X and query are transposed, in which case both matrices are assumed to contain dimensions in the rows and data points in the columns.

...

Further arguments to pass to buildIndex when X is not an external pointer.

BNPARAM

A BiocNeighborParam object specifying how the index should be constructed. If NULL, this defaults to a KmknnParam. Ignored if x contains a prebuilt index.

Details

If multiple queries are to be performed to the same X, it may be beneficial to build the index from X with buildIndex. The resulting pointer object can be supplied as X to multiple queryKNN calls, avoiding the need to repeat index construction in each call.

Value

List containing index (if get.index is not FALSE) and distance (if get.distance is not FALSE).

  • If get.index=TRUE or "normal" and k is an integer scalar, index is an integer matrix with k columns where each row corresponds to a point (denoted here as i) in query. The i-th row contains the indices of points in X that are the nearest neighbors to point i, sorted by increasing distance from i.

    If get.index=FALSE or "transposed" and k is an integer scalar, index is as described above but transposed, i.e., the i-th column contains the indices of neighboring points in X.

  • If get.distance=TRUE or "normal" and k is an integer scalar, distance is a numeric matrix of the same dimensions as index. The i-th row contains the distances of neighboring points in X to the point i, sorted in increasing order.

    If get.distance=FALSE or "transposed" and k is an integer scalar, distance is as described above but transposed, i.e., the i-th column contains the distances to neighboring points in X.

  • If get.index is not FALSE and k is an integer vector, index is a list of integer vectors where each vector corresponds to a point (denoted here as i) in X. The i-th vector has length k[i] and contains the indices of points in X that are the nearest neighbors to point i, sorted by increasing distance from i.

  • If get.distance is not FALSE and k is an integer vector, distance is a list of numeric vectors of the same lengths as those in index. The i-th vector contains the distances of neighboring points in X to the point i, sorted in increasing order.

Author(s)

Aaron Lun

See Also

buildIndex, to build an index ahead of time.

queryDistance, to obtain the distance from each query point to its k-th nearest neighbor.

Examples

Y <- matrix(rnorm(100000), ncol=20)
Z <- matrix(rnorm(20000), ncol=20)
out <- queryKNN(Y, query=Z, k=5)
head(out$index)
head(out$distance)


LTLA/BiocNeighbors documentation built on Dec. 12, 2024, 7:45 a.m.