File-Backed bigmemory Workflows
In bigANNOY: Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(bigANNOY.progress = FALSE)
set.seed(20260326)

One of the main goals of bigANNOY is to work comfortably with bigmemory data that already lives on disk. Instead of forcing a large reference matrix through dense in-memory copies, the package can build and query Annoy indexes directly from file-backed big.matrix objects and their descriptors.

This vignette focuses on the most common disk-oriented workflows:

building from a file-backed reference matrix
querying with descriptor objects and descriptor file paths
streaming neighbour results into file-backed destination matrices
working with separated-column big.matrix query layouts

Load the Packages

library(bigANNOY)
library(bigmemory)

Create a Small File-Backed Workspace

For reproducibility, we will create all backing files inside a temporary directory. In real work this would usually be a project directory or a shared data location.

workspace_dir <- tempfile("bigannoy-filebacked-")
dir.create(workspace_dir, recursive = TRUE, showWarnings = FALSE)

make_filebacked_matrix <- function(values, type, backingpath, name) {
  bm <- filebacked.big.matrix(
    nrow = nrow(values),
    ncol = ncol(values),
    type = type,
    backingfile = sprintf("%s.bin", name),
    descriptorfile = sprintf("%s.desc", name),
    backingpath = backingpath
  )
  bm[,] <- values
  bm
}

Build a File-Backed Reference Matrix

We will create a reference dataset and store it in a file-backed big.matrix. The corresponding descriptor file is what lets later R sessions reattach to the same on-disk data.

ref_dense <- matrix(
  c(
    0.0, 0.0,
    5.0, 0.0,
    0.0, 5.0,
    5.0, 5.0,
    9.0, 9.0
  ),
  ncol = 2,
  byrow = TRUE
)

ref_fb <- make_filebacked_matrix(
  values = ref_dense,
  type = "double",
  backingpath = workspace_dir,
  name = "ref"
)

ref_desc <- describe(ref_fb)
ref_desc_path <- file.path(workspace_dir, "ref.desc")

file.exists(ref_desc_path)
dim(ref_fb)

At this point we have:

a file-backed data file at ref.bin
a descriptor file at ref.desc
a big.matrix object currently attached in this R session

Build an Annoy Index from a Descriptor Path

The simplest persisted workflow is to build directly from the descriptor file path instead of from the live big.matrix object. That mirrors how later sessions typically work.

index_path <- file.path(workspace_dir, "ref.ann")

index <- annoy_build_bigmatrix(
  x = ref_desc_path,
  path = index_path,
  n_trees = 25L,
  metric = "euclidean",
  seed = 99L,
  load_mode = "lazy"
)

index

This pattern is useful because the build call no longer depends on a particular in-memory object being alive. As long as the descriptor can be reattached, the reference matrix can be used.

Accepted File-Oriented Input Forms

For x, query, xpIndex, and xpDistance, bigANNOY accepts several bigmemory-oriented forms:

a live big.matrix
an external pointer to a big.matrix
a big.matrix.descriptor object
a descriptor file path

For queries only, a dense numeric matrix is also accepted.

That flexibility matters most in persisted workflows where one part of the pipeline writes descriptors and another part reattaches them later.

Query with a File-Backed big.matrix

Now we will create a file-backed query matrix and search the persisted Annoy index against it.

query_dense <- matrix(
  c(
    0.2, 0.1,
    4.7, 5.1
  ),
  ncol = 2,
  byrow = TRUE
)

query_fb <- make_filebacked_matrix(
  values = query_dense,
  type = "double",
  backingpath = workspace_dir,
  name = "query"
)

query_result_big <- annoy_search_bigmatrix(
  index,
  query = query_fb,
  k = 2L,
  search_k = 100L
)

query_result_big$index
round(query_result_big$distance, 3)

The query matrix itself is file-backed, but the search call looks the same as it would for an in-memory big.matrix.

Query with a Descriptor Object and a Descriptor Path

The same persisted query data can be supplied through its descriptor object or through the descriptor file path. This is often the most convenient way to reattach query data across sessions.

query_desc <- describe(query_fb)
query_desc_path <- file.path(workspace_dir, "query.desc")

query_result_desc <- annoy_search_bigmatrix(
  index,
  query = query_desc,
  k = 2L,
  search_k = 100L
)

query_result_path <- annoy_search_bigmatrix(
  index,
  query = query_desc_path,
  k = 2L,
  search_k = 100L
)

query_result_desc$index
query_result_path$index

These should match the result obtained from the live big.matrix query.

identical(query_result_big$index, query_result_desc$index)
identical(query_result_big$index, query_result_path$index)
all.equal(query_result_big$distance, query_result_desc$distance)

Stream Results into File-Backed Destination Matrices

Large search results can be expensive to keep in ordinary R memory. To avoid that, bigANNOY can stream neighbour ids and distances directly into destination big.matrix objects.

For file-backed workflows, this means you can keep both the inputs and the outputs on disk.

index_store <- filebacked.big.matrix(
  nrow = nrow(query_dense),
  ncol = 2L,
  type = "integer",
  backingfile = "nn_index.bin",
  descriptorfile = "nn_index.desc",
  backingpath = workspace_dir
)

distance_store <- filebacked.big.matrix(
  nrow = nrow(query_dense),
  ncol = 2L,
  type = "double",
  backingfile = "nn_distance.bin",
  descriptorfile = "nn_distance.desc",
  backingpath = workspace_dir
)

streamed_result <- annoy_search_bigmatrix(
  index,
  query = query_desc,
  k = 2L,
  xpIndex = describe(index_store),
  xpDistance = file.path(workspace_dir, "nn_distance.desc")
)

bigmemory::as.matrix(index_store)
round(bigmemory::as.matrix(distance_store), 3)

The important practical details are:

xpIndex must be integer-compatible
xpDistance must be double-compatible
both destination matrices must have shape n_query x k
xpDistance can only be supplied when xpIndex is also supplied

Reattach the Output Files Later

Because the result matrices are file-backed, they can be reattached later in the same way as any other bigmemory artifact.

index_store_again <- attach.big.matrix(file.path(workspace_dir, "nn_index.desc"))
distance_store_again <- attach.big.matrix(file.path(workspace_dir, "nn_distance.desc"))

bigmemory::as.matrix(index_store_again)
round(bigmemory::as.matrix(distance_store_again), 3)

That is useful in longer pipelines where one step performs ANN search and a later step consumes the neighbour graph or distance matrix.

Separated-Column Query Matrices

bigANNOY also supports separated-column big.matrix layouts. These are not necessarily file-backed, but they are common in bigmemory workflows and are worth knowing about because they use a different memory layout from the usual contiguous matrix case.

query_sep <- big.matrix(
  nrow = nrow(query_dense),
  ncol = ncol(query_dense),
  type = "double",
  separated = TRUE
)
query_sep[,] <- query_dense

sep_result <- annoy_search_bigmatrix(
  index,
  query = describe(query_sep),
  k = 2L,
  search_k = 100L
)

sep_result$index
round(sep_result$distance, 3)

For the same query values, the separated-column result should match the ordinary file-backed query result.

identical(sep_result$index, query_result_big$index)
all.equal(sep_result$distance, query_result_big$distance)

Persisted Reference, Persisted Index, Persisted Outputs

Taken together, the main file-backed pattern looks like this:

store the reference data in a file-backed big.matrix
keep the descriptor alongside the backing file
build the Annoy index from the descriptor path
query using either a live big.matrix, a descriptor object, or a descriptor path
write neighbour results into file-backed destination matrices when result size matters

This is often the most practical way to use bigANNOY in large-data settings, because every major artifact in the workflow can be reopened later.

Practical Tips

Keep descriptor files with their corresponding backing files.
Keep the .ann file with its .meta sidecar file.
Use descriptor paths when you want to decouple one R session from another.
Use streamed outputs when n_query x k is too large to hold comfortably in ordinary R matrices.
Use the lifecycle helpers from the persistence vignette when you want to reopen and validate the Annoy index itself across sessions.

Recap

This vignette covered the main bigmemory persistence features in bigANNOY:

file-backed reference matrices
descriptor-object and descriptor-path queries
streamed file-backed outputs
reattachment of persisted outputs
separated-column query support

The natural next vignette after this one is Benchmarking Recall and Latency, which shows how to evaluate these workflows against runtime and quality targets.

Any scripts or data that you put into this service are public.

bigANNOY documentation built on April 1, 2026, 9:07 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bigANNOY
Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

File-Backed bigmemory Workflows
In bigANNOY: Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

Load the Packages

Create a Small File-Backed Workspace

Build a File-Backed Reference Matrix

Build an Annoy Index from a Descriptor Path

Accepted File-Oriented Input Forms

Query with a File-Backed big.matrix

Query with a Descriptor Object and a Descriptor Path

Stream Results into File-Backed Destination Matrices

Reattach the Output Files Later

Separated-Column Query Matrices

Persisted Reference, Persisted Index, Persisted Outputs

Practical Tips

Recap

Try the bigANNOY package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

bigANNOY Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

File-Backed bigmemory Workflows In bigANNOY: Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

Load the Packages

Create a Small File-Backed Workspace

Build a File-Backed Reference Matrix

Build an Annoy Index from a Descriptor Path

Accepted File-Oriented Input Forms

Query with a File-Backed big.matrix

Query with a Descriptor Object and a Descriptor Path

Stream Results into File-Backed Destination Matrices

Reattach the Output Files Later

Separated-Column Query Matrices

Persisted Reference, Persisted Index, Persisted Outputs

Practical Tips

Recap

Try the bigANNOY package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

bigANNOY
Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy

File-Backed bigmemory Workflows
In bigANNOY: Approximate k-Nearest Neighbour Search for 'bigmemory' Matrices with Annoy