buildHnsw: Build a HNSW index

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/buildHnsw.R

Description

Build a HNSW index and save it to file in preparation for a nearest-neighbors search.

Usage

1
2
3
buildHnsw(X, transposed=FALSE, nlinks=16, ef.construction=200, directory=tempdir(), 
    ef.search=10, fname=tempfile(tmpdir=directory, fileext=".idx"),
    distance=c("Euclidean", "Manhattan"))

Arguments

X

A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).

transposed

Logical scalar indicating whether X is transposed, i.e., rows are variables and columns are data points.

nlinks

Integer scalar specifying the number of bi-directional links for each element.

ef.construction

Integer scalar specifying the size of the dynamic list during index construction.

directory

String containing the path to the directory in which to save the index file.

ef.search

Integer scalar specifying the size of the dynamic list to use during neighbor searching.

fname

String containing the path to the index file.

distance

String specifying the type of distance to use.

Details

This function is automatically called by findHnsw and related functions. However, it can be called directly by the user to save time if multiple queries are to be performed to the same X.

It is advisable to change directory to a location that is amenable to parallel read operations on HPC file systems. Of course, if index files are manually constructed, the user is also responsible for their clean-up after all calculations are completed.

Larger values of nlinks improve accuracy at the expense of speed and memory usage. Larger values of ef.construction improve index quality at the expense of indexing time.

The value of ef.search controls the accuracy of the neighbor search at run time (i.e., not during the indexing itself). Larger values improve accuracy at the expense of a slower search. Note that this is always lower-bounded at k, the number of nearest neighbors to identify.

Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the HNSW neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.

Value

A HnswIndex object containing:

Author(s)

Aaron Lun

See Also

See HnswIndex for details on the output class.

See findHnsw and queryHnsw for dependent functions.

Examples

1
2
3
Y <- matrix(rnorm(100000), ncol=20)
out <- buildHnsw(Y)
out

BiocNeighbors documentation built on Dec. 9, 2020, 2:01 a.m.