findKNN: Find k-nearest neighbors

findKNNR Documentation

Find k-nearest neighbors

Description

Find the k-nearest neighbors of each point in a dataset.

Usage

findKNN(
  X,
  k,
  get.index = TRUE,
  get.distance = TRUE,
  num.threads = 1,
  subset = NULL,
  ...,
  BNPARAM = NULL
)

Arguments

X

A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions). Alternatively, a prebuilt BiocNeighborIndex object from buildIndex.

k

A positive integer scalar specifying the number of nearest neighbors to retrieve.

Alternatively, an integer vector of length equal to the number of points in X, specifying the number of neighbors to identify for each point. If subset is provided, this should have length equal to the length of subset. Users should wrap this vector in an AsIs class to distinguish length-1 vectors from integer scalars.

All k should be less than or equal to the number of points in X minus 1, otherwise the former will be capped at the latter with a warning.

get.index

A logical scalar indicating whether the indices of the nearest neighbors should be recorded. Setting this to FALSE improves efficiency if the indices are not of interest.

Alternatively, if k is an integer scalar, this may be a string containing "normal" or "transposed". The former is the same as TRUE, while the latter returns the index matrix in transposed format.

get.distance

A logical scalar indicating whether distances to the nearest neighbors should be recorded. Setting this to FALSE improves efficiency if the distances are not of interest.

Alternatively, if k is an integer scalar, this may be a string containing "normal" or "transposed". The former is the same as TRUE, while the latter returns the distance matrix in transposed format.

num.threads

Integer scalar specifying the number of threads to use for the search.

subset

An integer, logical or character vector specifying the indices of points in X for which the nearest neighbors should be identified. This yields the same result as (but is more efficient than) subsetting the output matrices after computing neighbors for all points.

...

Further arguments to pass to buildIndex when X is not an external pointer.

BNPARAM

A BiocNeighborParam object specifying how the index should be constructed. If NULL, this defaults to a KmknnParam. Ignored if x contains a prebuilt index.

Details

If multiple queries are to be performed to the same X, it may be beneficial to build the index from X with buildIndex. The resulting pointer object can be supplied as X to multiple findKNN calls, avoiding the need to repeat index construction in each call.

Value

List containing index (if get.index is not FALSE) and distance (if get.distance is not FALSE).

  • If get.index=TRUE or "normal" and k is an integer scalar, index is an integer matrix with k columns where each row corresponds to a point (denoted here as i) in X. The i-th row contains the indices of points in X that are the nearest neighbors to point i, sorted by increasing distance from i. i will not be included in its own set of nearest neighbors.

    If get.index=FALSE or "transposed" and k is an integer scalar, index is as described above but transposed, i.e., the i-th column contains the indices of neighboring points in X.

  • If get.distance=TRUE or "normal" and k is an integer scalar, distance is a numeric matrix of the same dimensions as index. The i-th row contains the distances of neighboring points in X to the point i, sorted in increasing order.

    If get.distance=FALSE or "transposed" and k is an integer scalar, distance is as described above but transposed, i.e., the i-th column contains the distances to neighboring points in X.

  • If get.index is not FALSE and k is an integer vector, index is a list of integer vectors where each vector corresponds to a point (denoted here as i) in X. The i-th vector has length k[i] and contains the indices of points in X that are the nearest neighbors to point i, sorted by increasing distance from i.

  • If get.distance is not FALSE and k is an integer vector, distance is a list of numeric vectors of the same lengths as those in index. The i-th vector contains the distances of neighboring points in X to the point i, sorted in increasing order.

Author(s)

Aaron Lun

See Also

buildIndex, to build an index ahead of time.

findDistance, to efficiently obtain the distance to the k-th nearest neighbor.

Examples

Y <- matrix(rnorm(100000), ncol=20)
out <- findKNN(Y, k=8)
head(out$index)
head(out$distance)


LTLA/BiocNeighbors documentation built on Dec. 12, 2024, 7:45 a.m.