knitr::opts_chunk$set(error=FALSE, warning=FALSE, message=FALSE, eval=FALSE)

Setting up your package

The \pkg{Annoy} \proglang{C++} library \citep{Github:annoy} implements a quick and simple method for approximate nearest neighbor (oh yeah) searching. The \pkg{RcppAnnoy} package \citep{CRAN:RcppAnnoy} provides a centralized resource for developers to use this code in their own \proglang{R} packages by relying on \pkg{Rcpp} \citep{TAS:Rcpp,CRAN:Rcpp}. To use \pkg{Annoy} in \proglang{C++} code, simply put in your DESCRIPTION the line

LinkingTo: RcppAnnoy

and the header files will be available for inclusion into your package's source files. Note that \pkg{Annoy} is a header-only library so no additional commands are necessary for the linker.

Including the header files

Obviously, the header files need to be included in any \proglang{C++} source file that uses \pkg{Annoy}. A few macros also need to be added to handle Windows-specific behaviour and to ensure that error messages are printed through R. Version number comparison macros help in conditioning changes on a particular version. Since release 0.0.17 all this is now expressed centrally in a header in the package so users can just use this one-liner:

```{Rcpp, eval=FALSE}

include "RcppAnnoy.h"

# Defining the search type 

The `AnnoyIndex` template class can accommodate different data types,
distance metrics, random number generators, and threading policies (where the
latter are a choice between sequential or multithreaded). 
Here, we will consider the most common application of a nearest-neighbor search on floating-point data with Euclidean distance.
We `typedef` the type and realized template for convenience:

```{Rcpp, eval=FALSE}
typedef float ANNOYTYPE;

typedef
Annoy::AnnoyIndex<int, ANNOYTYPE, Annoy::Euclidean, Kiss64Random,
                  RcppAnnoyIndexThreadPolicy>
MyAnnoyIndex;

Note that we use float by default, rather than the more conventional double. This is chosen for speed and to be consistent with the original Python implementation.

The \pkg{Annoy} library uses random number generation during index creation (via the Kiss64Random class), with a seed that is separate from R's RNG seed. By default, the seed is fixed and results will be "deterministic" in the sense that repeated runs on the same data will yield the same result. They will also be unresponsive to the state of R's RNG seed. The seed used by AnnoyIndex can be specified by the set_seed method, which should be called before adding items to the index.

Building an index



eddelbuettel/rcppannoy documentation built on Feb. 27, 2024, 4:34 a.m.