buildHnsw | R Documentation |
Build a HNSW index and save it to file in preparation for a nearest-neighbors search.
buildHnsw(
X,
transposed = FALSE,
nlinks = 16,
ef.construction = 200,
directory = tempdir(),
ef.search = 10,
fname = tempfile(tmpdir = directory, fileext = ".idx"),
distance = c("Euclidean", "Manhattan", "Cosine")
)
X |
A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions). |
transposed |
Logical scalar indicating whether |
nlinks |
Integer scalar specifying the number of bi-directional links for each element. |
ef.construction |
Integer scalar specifying the size of the dynamic list during index construction. |
directory |
String containing the path to the directory in which to save the index file. |
ef.search |
Integer scalar specifying the size of the dynamic list to use during neighbor searching. |
fname |
String containing the path to the index file. |
distance |
String specifying the type of distance to use. |
This function is automatically called by findHnsw
and related functions.
However, it can be called directly by the user to save time if multiple queries are to be performed to the same X
.
It is advisable to change directory
to a location that is amenable to parallel read operations on HPC file systems.
Of course, if index files are manually constructed, the user is also responsible for their clean-up after all calculations are completed.
Larger values of nlinks
improve accuracy at the expense of speed and memory usage.
Larger values of ef.construction
improve index quality at the expense of indexing time.
The value of ef.search
controls the accuracy of the neighbor search at run time.
Larger values improve accuracy at the expense of a slower search.
In findHnsw
and queryHnsw
, this is always lower-bounded at k
, the number of nearest neighbors to identify.
Note that this parameter is not actually used in the index construction itself, and is only included here so that the output index fully parametrizes the search.
Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the HNSW neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.
An AnnoyIndex object containing a path to the index file, plus additional parameters for the search.
Aaron Lun
HnswIndex, for details on the output class.
findHnsw
and queryHnsw
, for dependent functions.
Y <- matrix(rnorm(100000), ncol=20)
out <- buildHnsw(Y)
out
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.