lsh: K-Approximate-Nearest-Neighbor Search with LSH
In mlpack: 'Rcpp' Integration for the 'mlpack' Library

View source: R/lsh.R

lsh	R Documentation

K-Approximate-Nearest-Neighbor Search with LSH

Description

An implementation of approximate k-nearest-neighbor search with locality-sensitive hashing (LSH). Given a set of reference points and a set of query points, this will compute the k approximate nearest neighbors of each query point in the reference set; models can be saved for future use.

Usage

lsh(
  bucket_size = NA,
  hash_width = NA,
  input_model = NA,
  k = NA,
  num_probes = NA,
  projections = NA,
  query = NA,
  reference = NA,
  second_hash_size = NA,
  seed = NA,
  tables = NA,
  true_neighbors = NA,
  verbose = getOption("mlpack.verbose", FALSE)
)

Arguments

`bucket_size`	The size of a bucket in the second level hash. Default value "500" (integer).
`hash_width`	The hash width for the first-level hashing in the LSH preprocessing. By default, the LSH class automatically estimates a hash width for its use. Default value "0" (numeric).
`input_model`	Input LSH model (LSHSearch).
`k`	Number of nearest neighbors to find. Default value "0" (integer).
`num_probes`	Number of additional probes for multiprobe LSH; if 0, traditional LSH is used. Default value "0" (integer).
`projections`	The number of hash functions for each tabl. Default value "10" (integer).
`query`	Matrix containing query points (optional) (numeric matrix).
`reference`	Matrix containing the reference dataset (numeric matrix).
`second_hash_size`	The size of the second level hash table. Default value "99901" (integer).
`seed`	Random seed. If 0, 'std::time(NULL)' is used. Default value "0" (integer).
`tables`	The number of hash tables to be used. Default value "30" (integer).
`true_neighbors`	Matrix of true neighbors to compute recall with (the recall is printed when -v is specified) (integer matrix).
`verbose`	Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).

Details

This program will calculate the k approximate-nearest-neighbors of a set of points using locality-sensitive hashing. You may specify a separate set of reference points and query points, or just a reference set which will be used as both the reference and query set.

Value

A list with several components:

`distances`	Matrix to output distances into (numeric matrix).
`neighbors`	Matrix to output neighbors into (integer matrix).
`output_model`	Output for trained LSH model (LSHSearch).

Author(s)

mlpack developers

Examples

# For example, the following will return 5 neighbors from the data for each
# point in "input" and store the distances in "distances" and the neighbors
# in "neighbors":

## Not run: 
output <- lsh(k=5, reference=input)
distances <- output$distances
neighbors <- output$neighbors

## End(Not run)

# The output is organized such that row i and column j in the neighbors
# output corresponds to the index of the point in the reference set which is
# the j'th nearest neighbor from the point in the query set with index i. 
# Row j and column i in the distances output file corresponds to the distance
# between those two points.
# 
# Because this is approximate-nearest-neighbors search, results may be
# different from run to run.  Thus, the "seed" parameter can be specified to
# set the random seed.
# 
# This program also has many other parameters to control its functionality;
# see the parameter-specific documentation for more information.

mlpack documentation built on Sept. 2, 2025, 1:09 a.m.