Description Usage Arguments Details Value Author(s) See Also Examples
Build an Annoy index and save it to file in preparation for a nearest-neighbors search.
1 2 3 | buildAnnoy(X, transposed=FALSE, ntrees=50, directory=tempdir(),
search.mult=ntrees, fname=tempfile(tmpdir=directory, fileext=".idx"),
distance=c("Euclidean", "Manhattan"))
|
X |
A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions). |
transposed |
Logical scalar indicating whether |
ntrees |
Integer scalar specifying the number of trees to build in the index. |
directory |
String containing the path to the directory in which to save the index file. |
search.mult |
Numeric scalar specifying the multiplier for the number of points to search. |
fname |
String containing the path to the index file. |
distance |
String specifying the type of distance to use. |
This function is automatically called by findAnnoy
and related functions.
However, it can be called directly by the user to save time if multiple queries are to be performed to the same X
.
It is advisable to change directory
to a location that is amenable to parallel read operations on HPC file systems.
Of course, if index files are manually constructed, the user is also responsible for their clean-up after all calculations are completed.
The ntrees
parameter controls the trade-off between accuracy and computational work.
More trees provide greater accuracy at the cost of more computational work (both in terms of the indexing time and search speed in downstream functions).
The search.mult
controls the parameter known as search_k
in the original Annoy documentation.
Specifically, search_k
is defined as k * search.mult
where k
is the number of nearest neighbors to identify in downstream functions.
This represents the number of points to search exhaustively and determines the run-time balance between speed and accuracy.
The default search.mult=ntrees
represents the Annoy library defaults.
Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the Annoy neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.
A AnnoyIndex object containing:
path
, a string containing the path to the index file.
data
, a numeric matrix equivalent to t(X)
.
search.mult
,a numeric scalar specifying the number of points to search in downstream functions.
NAMES
, a character vector or NULL
equal to rownames(X)
.
distance
, a string specifying the distance metric used.
Aaron Lun
See AnnoyIndex
for details on the output class.
See findAnnoy
and queryAnnoy
for dependent functions.
1 2 3 | Y <- matrix(rnorm(100000), ncol=20)
out <- buildAnnoy(Y)
out
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.