README.md

output: rmarkdown::github_document

bigANNOY

Approximate nearest-neighbour search for bigmemory matrices with Annoy

Frédéric Bertrand

R-CMD-check R-hub GitHub Repo stars

The bigANNOY package provides approximate nearest-neighbour search specialised for bigmemory::big.matrix objects through persisted Annoy indexes. It keeps the reference data in bigmemory storage during build and query workflows, supports repeated-query sessions through explicit open/load helpers, and can stream neighbour indices and distances directly into destination big.matrix objects.

Current features include:

These workflows make bigANNOY useful both as a standalone approximate search package and as the ANN side of an exact-versus-approximate evaluation pipeline built around bigKNN.

Installation

The package is currently easiest to install from GitHub:

# install.packages("remotes")
remotes::install_github("fbertran/bigANNOY")

If you prefer a local source install, clone the repository and run:

R CMD build bigANNOY
R CMD INSTALL bigANNOY_0.3.0.tar.gz

Options

The package defines a small set of runtime options:

Option | Default value | Description --- | --- | --- bigANNOY.block_size | 1024L | Default number of rows processed per build/search block. bigANNOY.progress | FALSE | Emit simple progress messages during long-running builds, searches, and benchmarks. bigANNOY.backend | "cpp" | Backend request. "cpp" uses the native compiled backend, "auto" falls back when compiled symbols are not loaded, and "r" forces the debug-only R backend.

All options can be changed with options() at runtime. For example, options(bigANNOY.block_size = 2048L) increases the default block size used by the build and search helpers.

Examples

The examples below use a small Euclidean reference matrix so the returned neighbours are easy to inspect.

Build and query an Annoy index

library(bigmemory)
library(bigANNOY)

reference <- as.big.matrix(matrix(
  c(0, 0,
    1, 0,
    0, 1,
    1, 1,
    2, 2),
  ncol = 2,
  byrow = TRUE
))

query <- matrix(
  c(0.1, 0.1,
    1.8, 1.9),
  ncol = 2,
  byrow = TRUE
)

index <- annoy_build_bigmatrix(
  reference,
  path = tempfile(fileext = ".ann"),
  metric = "euclidean",
  n_trees = 20L,
  seed = 123L,
  load_mode = "eager"
)

result <- annoy_search_bigmatrix(
  index,
  query = query,
  k = 2L,
  search_k = 100L
)

result$index
round(result$distance, 3)

Reopen and validate a persisted index

reopened <- annoy_open_index(index$path, load_mode = "lazy")

annoy_is_loaded(reopened)

report <- annoy_validate_index(
  reopened,
  strict = TRUE,
  load = TRUE
)

report$valid
annoy_is_loaded(reopened)

Stream results into bigmemory outputs

index_store <- big.matrix(nrow(query), 2L, type = "integer")
distance_store <- big.matrix(nrow(query), 2L, type = "double")

annoy_search_bigmatrix(
  index,
  query = query,
  k = 2L,
  xpIndex = index_store,
  xpDistance = distance_store
)

bigmemory::as.matrix(index_store)
round(bigmemory::as.matrix(distance_store), 3)

Benchmark approximate Euclidean search

benchmark_annoy_bigmatrix(
  n_ref = 2000L,
  n_query = 200L,
  n_dim = 20L,
  k = 10L,
  n_trees = 50L,
  search_k = 1000L,
  metric = "euclidean",
  exact = TRUE
)

If bigKNN is installed, the Euclidean benchmark helpers also report exact search timing and recall against the exact baseline.

Installed Benchmark Runner

An installed command-line benchmark script is also available at:

system.file("benchmarks", "benchmark_annoy.R", package = "bigANNOY")

Example single-run command:

Rscript "$(R -q -e 'cat(system.file(\"benchmarks\", \"benchmark_annoy.R\", package = \"bigANNOY\"))')" \
  --mode=single \
  --n_ref=5000 \
  --n_query=500 \
  --n_dim=50 \
  --k=20 \
  --n_trees=100 \
  --search_k=5000 \
  --load_mode=eager

Vignettes

The package now ships with focused vignettes for the main workflows:

Together they cover the basic ANN workflow, loaded-index lifecycle, file-backed bigmemory usage, benchmarking and recall evaluation, tuning, validation and sharing of persisted indexes, and the relationship between approximate bigANNOY search and exact bigKNN search.



Try the bigANNOY package in your browser

Any scripts or data that you put into this service are public.

bigANNOY documentation built on April 1, 2026, 9:07 a.m.