hnsw_search: Search an hnswlib nearest neighbor index

View source: R/hnsw.R

hnsw_searchR Documentation

Search an hnswlib nearest neighbor index

Description

Search an hnswlib nearest neighbor index

Usage

hnsw_search(
  X,
  ann,
  k,
  ef = 10,
  verbose = FALSE,
  progress = "bar",
  n_threads = 0,
  grain_size = 1,
  byrow = TRUE
)

Arguments

X

A numeric matrix of data to search for neighbors. If byrow = TRUE (the default) then each row of X is an item to be searched. Otherwise, each item should be stored in the columns of X.

ann

an instance of a HnswL2, HnswCosine or HnswIp class.

k

Number of neighbors to return. This can't be larger than the number of items that were added to the index ann. To check the size of the index, call ann$size().

ef

Size of the dynamic list used during search. Higher values lead to improved recall at the expense of longer search time. Can take values between k and the size of the dataset. Typical values are 100 - 2000.

verbose

If TRUE, log messages to the console.

progress

defunct and has no effect.

n_threads

Maximum number of threads to use. The exact number is determined by grain_size.

grain_size

Minimum amount of work to do (items in X to search) per thread. If the number of items in X isn't sufficient, then fewer than n_threads will be used. This is useful in cases where the overhead of context switching with too many threads outweighs the gains due to parallelism.

byrow

if TRUE (the default), this indicates that the items to be searched in X are stored in each row of X. Otherwise, the items are stored in the columns of X. Storing items in each column reduces the overhead of copying data to a form that can be searched by the hnsw library. Note that if byrow = FALSE, any matrices returned from this function will also store the items by column.

Value

a list containing:

  • idx a matrix containing the nearest neighbor indices.

  • dist a matrix containing the nearest neighbor distances.

The dimensions of the matrices respect the storage (row or column-based) of X as indicated by the byrow parameter. If byrow = TRUE (the default) each row of idx and dist contain the neighbor information for the item passed in the equivalent row of X, i.e. the dimensions are ⁠n x k⁠ where n is the number of items in X. If byrow = FALSE, then each column of idx and dist contain the neighbor information for the item passed in the equivalent column of X, i.e. the dimensions are ⁠k x n⁠.

Every item in the dataset is considered to be a neighbor of itself, so the first neighbor of item i should always be i itself. If that isn't the case, then any of M or ef may need increasing.

Examples

irism <- as.matrix(iris[, -5])
ann <- hnsw_build(irism)
iris_nn <- hnsw_search(irism, ann, k = 5)

RcppHNSW documentation built on Sept. 19, 2023, 9:06 a.m.