The bigANNOY package provides approximate nearest-neighbour search
specialised for bigmemory::big.matrix objects through persisted Annoy
indexes. It keeps the reference data in bigmemory storage during build and
query workflows, supports repeated-query sessions through explicit open/load
helpers, and can stream neighbour indices and distances directly into
destination big.matrix objects.
Current features include:
big.matrix
objects, descriptors, descriptor paths, and external pointers,big.matrix destinations,annoy_open_index(),
annoy_load_bigmatrix(), annoy_is_loaded(), annoy_close_index(), and
annoy_validate_index(), andbigKNN baseline when bigKNN is available.These workflows make bigANNOY useful both as a standalone approximate
search package and as the ANN side of an exact-versus-approximate evaluation
pipeline built around bigKNN.
The package is currently easiest to install from GitHub:
# install.packages("remotes")
remotes::install_github("fbertran/bigANNOY")
If you prefer a local source install, clone the repository and run:
R CMD build bigANNOY
R CMD INSTALL bigANNOY_0.3.0.tar.gz
The package defines a small set of runtime options:
Option | Default value | Description
--- | --- | ---
bigANNOY.block_size | 1024L | Default number of rows processed per build/search block.
bigANNOY.progress | FALSE | Emit simple progress messages during long-running builds, searches, and benchmarks.
bigANNOY.backend | "cpp" | Backend request. "cpp" uses the native compiled backend, "auto" falls back when compiled symbols are not loaded, and "r" forces the debug-only R backend.
All options can be changed with options() at runtime. For example,
options(bigANNOY.block_size = 2048L) increases the default block size used
by the build and search helpers.
The examples below use a small Euclidean reference matrix so the returned neighbours are easy to inspect.
library(bigmemory)
library(bigANNOY)
reference <- as.big.matrix(matrix(
c(0, 0,
1, 0,
0, 1,
1, 1,
2, 2),
ncol = 2,
byrow = TRUE
))
query <- matrix(
c(0.1, 0.1,
1.8, 1.9),
ncol = 2,
byrow = TRUE
)
index <- annoy_build_bigmatrix(
reference,
path = tempfile(fileext = ".ann"),
metric = "euclidean",
n_trees = 20L,
seed = 123L,
load_mode = "eager"
)
result <- annoy_search_bigmatrix(
index,
query = query,
k = 2L,
search_k = 100L
)
result$index
round(result$distance, 3)
reopened <- annoy_open_index(index$path, load_mode = "lazy")
annoy_is_loaded(reopened)
report <- annoy_validate_index(
reopened,
strict = TRUE,
load = TRUE
)
report$valid
annoy_is_loaded(reopened)
index_store <- big.matrix(nrow(query), 2L, type = "integer")
distance_store <- big.matrix(nrow(query), 2L, type = "double")
annoy_search_bigmatrix(
index,
query = query,
k = 2L,
xpIndex = index_store,
xpDistance = distance_store
)
bigmemory::as.matrix(index_store)
round(bigmemory::as.matrix(distance_store), 3)
benchmark_annoy_bigmatrix(
n_ref = 2000L,
n_query = 200L,
n_dim = 20L,
k = 10L,
n_trees = 50L,
search_k = 1000L,
metric = "euclidean",
exact = TRUE
)
If bigKNN is installed, the Euclidean benchmark helpers also report exact
search timing and recall against the exact baseline.
An installed command-line benchmark script is also available at:
system.file("benchmarks", "benchmark_annoy.R", package = "bigANNOY")
Example single-run command:
Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
--mode=single \
--n_ref=5000 \
--n_query=500 \
--n_dim=50 \
--k=20 \
--n_trees=100 \
--search_k=5000 \
--load_mode=eager
The package now ships with focused vignettes for the main workflows:
getting-started-bigannoypersistent-indexes-and-lifecyclefile-backed-bigmemory-workflowsbenchmarking-recall-and-latencymetrics-and-tuningvalidation-and-sharing-indexesbigannoy-vs-bigknnTogether they cover the basic ANN workflow, loaded-index lifecycle, file-backed
bigmemory usage, benchmarking and recall evaluation, tuning, validation and
sharing of persisted indexes, and the relationship between approximate
bigANNOY search and exact bigKNN search.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.