AnnoyParam | R Documentation |
A class to hold parameters for the Annoy algorithm for approximate nearest neighbor identification.
AnnoyParam(
ntrees = 50,
search.mult = ntrees,
distance = c("Euclidean", "Manhattan", "Cosine")
)
## S4 method for signature 'AnnoyParam'
defineBuilder(BNPARAM)
ntrees |
Integer scalar, number of trees to use for index generation. |
search.mult |
Numeric scalar, multiplier for the number of points to search. |
distance |
String specifying the distance metric to use. Cosine distances are implemented as Euclidean distances on L2-normalized coordinates. |
BNPARAM |
An AnnoyParam instance. |
The Approximate nearest neighbors Oh Yeah (Annoy) algorithm is based on recursive hyperplane partitions. Briefly, a tree is constructed where a random hyperplane splits the points into two subsets at each internal node. Leaf nodes are defined when the number of points in a subset falls below a threshold (close to twice the number of dimensions for the settings used here). Multiple trees are constructed in this manner, each of which is different due to the random choice of hyperplanes. For a given query point, each tree is searched to identify the subset of all points in the same leaf node as the query point. The union of these subsets across all trees is exhaustively searched to identify the actual nearest neighbors to the query.
The ntrees
parameter controls the trade-off between accuracy and computational work.
More trees provide greater accuracy at the cost of more computational work (both in terms of the indexing time and search speed in downstream functions).
The search.mult
controls the parameter known as search_k
in the original Annoy documentation.
Specifically, search_k
is defined as k * search.mult
where k
is the number of nearest neighbors to identify in downstream functions.
This represents the number of points to search exhaustively and determines the run-time balance between speed and accuracy.
The default search.mult=ntrees
is based on the Annoy library defaults.
Note that this parameter is not actually used in the index construction itself, and is only included here so that the output index fully parametrizes the search.
Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the Annoy neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.
The AnnoyParam
constructor returns an instance of the AnnoyParam class.
The defineBuilder
method returns a list that can be used in buildIndex
to construct an Annoy index.
Aaron Lun
BiocNeighborParam, for the parent class and its available methods.
https://github.com/spotify/annoy, for details on the underlying algorithm.
(out <- AnnoyParam())
out[['ntrees']]
out[['ntrees']] <- 20L
out
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.