AnnoyParam: The AnnoyParam class

View source: R/AnnoyParam.R

AnnoyParamR Documentation

The AnnoyParam class

Description

A class to hold parameters for the Annoy algorithm for approximate nearest neighbor identification.

Usage

AnnoyParam(
  ntrees = 50,
  search.mult = ntrees,
  distance = c("Euclidean", "Manhattan", "Cosine")
)

## S4 method for signature 'AnnoyParam'
defineBuilder(BNPARAM)

Arguments

ntrees

Integer scalar, number of trees to use for index generation.

search.mult

Numeric scalar, multiplier for the number of points to search.

distance

String specifying the distance metric to use. Cosine distances are implemented as Euclidean distances on L2-normalized coordinates.

BNPARAM

An AnnoyParam instance.

Details

The Approximate nearest neighbors Oh Yeah (Annoy) algorithm is based on recursive hyperplane partitions. Briefly, a tree is constructed where a random hyperplane splits the points into two subsets at each internal node. Leaf nodes are defined when the number of points in a subset falls below a threshold (close to twice the number of dimensions for the settings used here). Multiple trees are constructed in this manner, each of which is different due to the random choice of hyperplanes. For a given query point, each tree is searched to identify the subset of all points in the same leaf node as the query point. The union of these subsets across all trees is exhaustively searched to identify the actual nearest neighbors to the query.

The ntrees parameter controls the trade-off between accuracy and computational work. More trees provide greater accuracy at the cost of more computational work (both in terms of the indexing time and search speed in downstream functions).

The search.mult controls the parameter known as search_k in the original Annoy documentation. Specifically, search_k is defined as k * search.mult where k is the number of nearest neighbors to identify in downstream functions. This represents the number of points to search exhaustively and determines the run-time balance between speed and accuracy. The default search.mult=ntrees is based on the Annoy library defaults. Note that this parameter is not actually used in the index construction itself, and is only included here so that the output index fully parametrizes the search.

Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the Annoy neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.

Value

The AnnoyParam constructor returns an instance of the AnnoyParam class.

The defineBuilder method returns a list that can be used in buildIndex to construct an Annoy index.

Author(s)

Aaron Lun

See Also

BiocNeighborParam, for the parent class and its available methods.

https://github.com/spotify/annoy, for details on the underlying algorithm.

Examples

(out <- AnnoyParam())
out[['ntrees']]

out[['ntrees']] <- 20L
out


LTLA/kmknn documentation built on Oct. 18, 2024, 9:01 p.m.