buildAnnoy: Build an Annoy index
In LTLA/kmknn: Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy

R Documentation

Build an Annoy index

Description

Build an Annoy index and save it to file in preparation for a nearest-neighbors search.

Usage

buildAnnoy(
  X,
  transposed = FALSE,
  ntrees = 50,
  directory = tempdir(),
  search.mult = ntrees,
  fname = tempfile(tmpdir = directory, fileext = ".idx"),
  distance = c("Euclidean", "Manhattan", "Cosine")
)

Arguments

`X`	A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).
`transposed`	Logical scalar indicating whether `X` is transposed, i.e., rows are variables and columns are data points.
`ntrees`	Integer scalar specifying the number of trees to build in the index.
`directory`	String containing the path to the directory in which to save the index file.
`search.mult`	Numeric scalar specifying the multiplier for the number of points to search.
`fname`	String containing the path to the index file.
`distance`	String specifying the type of distance to use.

Details

This function is automatically called by findAnnoy and related functions. However, it can be called directly by the user to save time if multiple queries are to be performed to the same X.

It is advisable to change directory to a location that is amenable to parallel read operations on HPC file systems. Of course, if index files are manually constructed, the user is also responsible for their clean-up after all calculations are completed.

The ntrees parameter controls the trade-off between accuracy and computational work. More trees provide greater accuracy at the cost of more computational work (both in terms of the indexing time and search speed in downstream functions).

The search.mult controls the parameter known as search_k in the original Annoy documentation. Specifically, search_k is defined as k * search.mult where k is the number of nearest neighbors to identify in downstream functions. This represents the number of points to search exhaustively and determines the run-time balance between speed and accuracy. The default search.mult=ntrees is based on the Annoy library defaults. Note that this parameter is not actually used in the index construction itself, and is only included here so that the output index fully parametrizes the search.

Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the Annoy neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.

Value

An AnnoyIndex object containing a path to the index file, plus additional parameters for the search.

Author(s)

Aaron Lun

Examples

Y <- matrix(rnorm(100000), ncol=20)
out <- buildAnnoy(Y)
out

LTLA/kmknn documentation built on Feb. 5, 2024, 6:03 p.m.

LTLA/kmknn index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

LTLA/kmknn
Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy: Build an Annoy index
In LTLA/kmknn: Nearest Neighbor Detection for Bioconductor Packages

Build an Annoy index

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to buildAnnoy in LTLA/kmknn...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/kmknn Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy: Build an Annoy index In LTLA/kmknn: Nearest Neighbor Detection for Bioconductor Packages

Build an Annoy index

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to buildAnnoy in LTLA/kmknn...

R Package Documentation

Browse R Packages

We want your feedback!

LTLA/kmknn
Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy: Build an Annoy index
In LTLA/kmknn: Nearest Neighbor Detection for Bioconductor Packages