buildHnsw: Build a HNSW index

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/buildHnsw.R

Description

Build a HNSW index and save it to file in preparation for a nearest-neighbors search.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
buildHnsw(
  X,
  transposed = FALSE,
  nlinks = 16,
  ef.construction = 200,
  directory = tempdir(),
  ef.search = 10,
  fname = tempfile(tmpdir = directory, fileext = ".idx"),
  distance = c("Euclidean", "Manhattan", "Cosine")
)

Arguments

X

A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).

transposed

Logical scalar indicating whether X is transposed, i.e., rows are variables and columns are data points.

nlinks

Integer scalar specifying the number of bi-directional links for each element.

ef.construction

Integer scalar specifying the size of the dynamic list during index construction.

directory

String containing the path to the directory in which to save the index file.

ef.search

Integer scalar specifying the size of the dynamic list to use during neighbor searching.

fname

String containing the path to the index file.

distance

String specifying the type of distance to use.

Details

This function is automatically called by findHnsw and related functions. However, it can be called directly by the user to save time if multiple queries are to be performed to the same X.

It is advisable to change directory to a location that is amenable to parallel read operations on HPC file systems. Of course, if index files are manually constructed, the user is also responsible for their clean-up after all calculations are completed.

Larger values of nlinks improve accuracy at the expense of speed and memory usage. Larger values of ef.construction improve index quality at the expense of indexing time.

The value of ef.search controls the accuracy of the neighbor search at run time. Larger values improve accuracy at the expense of a slower search. In findHnsw and queryHnsw, this is always lower-bounded at k, the number of nearest neighbors to identify. Note that this parameter is not actually used in the index construction itself, and is only included here so that the output index fully parametrizes the search.

Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the HNSW neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.

Value

An AnnoyIndex object containing a path to the index file, plus additional parameters for the search.

Author(s)

Aaron Lun

See Also

HnswIndex, for details on the output class.

findHnsw and queryHnsw, for dependent functions.

Examples

1
2
3
Y <- matrix(rnorm(100000), ncol=20)
out <- buildHnsw(Y)
out

LTLA/BiocNeighbors documentation built on Sept. 18, 2021, 8:19 p.m.