buildAnnoy: Build an Annoy index
In BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

Description Usage Arguments Details Value Author(s) See Also Examples

Build an Annoy index and save it to file in preparation for a nearest-neighbors search.

1
2
3

buildAnnoy(X, transposed=FALSE, ntrees=50, directory=tempdir(),
    search.mult=ntrees, fname=tempfile(tmpdir=directory, fileext=".idx"),
    distance=c("Euclidean", "Manhattan"))

`X`	A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions).
`transposed`	Logical scalar indicating whether `X` is transposed, i.e., rows are variables and columns are data points.
`ntrees`	Integer scalar specifying the number of trees to build in the index.
`directory`	String containing the path to the directory in which to save the index file.
`search.mult`	Numeric scalar specifying the multiplier for the number of points to search.
`fname`	String containing the path to the index file.
`distance`	String specifying the type of distance to use.

This function is automatically called by findAnnoy and related functions. However, it can be called directly by the user to save time if multiple queries are to be performed to the same X.

It is advisable to change directory to a location that is amenable to parallel read operations on HPC file systems. Of course, if index files are manually constructed, the user is also responsible for their clean-up after all calculations are completed.

The ntrees parameter controls the trade-off between accuracy and computational work. More trees provide greater accuracy at the cost of more computational work (both in terms of the indexing time and search speed in downstream functions).

The search.mult controls the parameter known as search_k in the original Annoy documentation. Specifically, search_k is defined as k * search.mult where k is the number of nearest neighbors to identify in downstream functions. This represents the number of points to search exhaustively and determines the run-time balance between speed and accuracy. The default search.mult=ntrees represents the Annoy library defaults.

Technically, the index construction algorithm is stochastic but, for various logistical reasons, the seed is hard-coded into the C++ code. This means that the results of the Annoy neighbor searches will be fully deterministic for the same inputs, even though the theory provides no such guarantees.

A AnnoyIndex object containing:

path, a string containing the path to the index file.
data, a numeric matrix equivalent to t(X).
search.mult,a numeric scalar specifying the number of points to search in downstream functions.
NAMES, a character vector or NULL equal to rownames(X).
distance, a string specifying the distance metric used.

Aaron Lun

See AnnoyIndex for details on the output class.

See findAnnoy and queryAnnoy for dependent functions.

1
2
3

Y <- matrix(rnorm(100000), ncol=20)
out <- buildAnnoy(Y)
out

BiocNeighbors documentation built on Dec. 9, 2020, 2:01 a.m.

BiocNeighbors index

Detecting all neighbors within range Detecting approximate nearest neighbors Detecting exact nearest neighbors

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

BiocNeighbors
Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy: Build an Annoy index
In BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to buildAnnoy in BiocNeighbors...

R Package Documentation

Browse R Packages

We want your feedback!

BiocNeighbors Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy: Build an Annoy index In BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to buildAnnoy in BiocNeighbors...

R Package Documentation

Browse R Packages

We want your feedback!

BiocNeighbors
Nearest Neighbor Detection for Bioconductor Packages

buildAnnoy: Build an Annoy index
In BiocNeighbors: Nearest Neighbor Detection for Bioconductor Packages