approx_kfn: Approximate furthest neighbor search

View source: R/approx_kfn.R

approx_kfnR Documentation

Approximate furthest neighbor search


An implementation of two strategies for furthest neighbor search. This can be used to compute the furthest neighbor of query point(s) from a set of points; furthest neighbor models can be saved and reused with future query point(s).


  algorithm = NA,
  calculate_error = FALSE,
  exact_distances = NA,
  input_model = NA,
  k = NA,
  num_projections = NA,
  num_tables = NA,
  query = NA,
  reference = NA,
  verbose = FALSE



Algorithm to use: 'ds' or 'qdafn'. Default value "ds" (character).


If set, calculate the average distance error for the first furthest neighbor only. Default value "FALSE" (logical).


Matrix containing exact distances to furthest neighbors; this can be used to avoid explicit calculation when –calculate_error is set (numeric matrix).


File containing input model (ApproxKFNModel).


Number of furthest neighbors to search for. Default value "0" (integer).


Number of projections to use in each hash table. Default value "5" (integer).


Number of hash tables to use. Default value "5" (integer).


Matrix containing query points (numeric matrix).


Matrix containing the reference dataset (numeric matrix).


Display informational messages and the full list of parameters and timers at the end of execution. Default value "FALSE" (logical).


This program implements two strategies for furthest neighbor search. These strategies are:

- The 'qdafn' algorithm from "Approximate Furthest Neighbor in High Dimensions" by R. Pagh, F. Silvestri, J. Sivertsen, and M. Skala, in Similarity Search and Applications 2015 (SISAP). - The 'DrusillaSelect' algorithm from "Fast approximate furthest neighbors with data-dependent candidate selection", by R.R. Curtin and A.B. Gardner, in Similarity Search and Applications 2016 (SISAP).

These two strategies give approximate results for the furthest neighbor search problem and can be used as fast replacements for other furthest neighbor techniques such as those found in the mlpack_kfn program. Note that typically, the 'ds' algorithm requires far fewer tables and projections than the 'qdafn' algorithm.

Specify a reference set (set to search in) with "reference", specify a query set with "query", and specify algorithm parameters with "num_tables" and "num_projections" (or don't and defaults will be used). The algorithm to be used (either 'ds'—the default—or 'qdafn') may be specified with "algorithm". Also specify the number of neighbors to search for with "k".

Note that for 'qdafn' in lower dimensions, "num_projections" may need to be set to a high value in order to return results for each query point.

If no query set is specified, the reference set will be used as the query set. The "output_model" output parameter may be used to store the built model, and an input model may be loaded instead of specifying a reference set with the "input_model" option.

Results for each query point can be stored with the "neighbors" and "distances" output parameters. Each row of these output matrices holds the k distances or neighbor indices for each query point.


A list with several components:


Matrix to save furthest neighbor distances to (numeric matrix).


Matrix to save neighbor indices to (integer matrix).


File to save output model to (ApproxKFNModel).


mlpack developers


# For example, to find the 5 approximate furthest neighbors with
# "reference_set" as the reference set and "query_set" as the query set using
# DrusillaSelect, storing the furthest neighbor indices to "neighbors" and
# the furthest neighbor distances to "distances", one could call

## Not run: 
output <- approx_kfn(query=query_set, reference=reference_set, k=5,
neighbors <- output$neighbors
distances <- output$distances

## End(Not run)

# and to perform approximate all-furthest-neighbors search with k=1 on the
# set "data" storing only the furthest neighbor distances to "distances", one
# could call

## Not run: 
output <- approx_kfn(reference=reference_set, k=1)
distances <- output$distances

## End(Not run)

# A trained model can be re-used.  If a model has been previously saved to
# "model", then we may find 3 approximate furthest neighbors on a query set
# "new_query_set" using that model and store the furthest neighbor indices
# into "neighbors" by calling

## Not run: 
output <- approx_kfn(input_model=model, query=new_query_set, k=3)
neighbors <- output$neighbors

## End(Not run)

mlpack documentation built on Oct. 29, 2022, 1:06 a.m.

Related to approx_kfn in mlpack...