lsh: K-Approximate-Nearest-Neighbor Search with LSH

View source: R/lsh.R

lshR Documentation

K-Approximate-Nearest-Neighbor Search with LSH

Description

An implementation of approximate k-nearest-neighbor search with locality-sensitive hashing (LSH). Given a set of reference points and a set of query points, this will compute the k approximate nearest neighbors of each query point in the reference set; models can be saved for future use.

Usage

lsh(
  bucket_size = NA,
  hash_width = NA,
  input_model = NA,
  k = NA,
  num_probes = NA,
  projections = NA,
  query = NA,
  reference = NA,
  second_hash_size = NA,
  seed = NA,
  tables = NA,
  true_neighbors = NA,
  verbose = FALSE
)

Arguments

bucket_size

The size of a bucket in the second level hash. Default value "500" (integer).

hash_width

The hash width for the first-level hashing in the LSH preprocessing. By default, the LSH class automatically estimates a hash width for its use. Default value "0" (numeric).

input_model

Input LSH model (LSHSearch).

k

Number of nearest neighbors to find. Default value "0" (integer).

num_probes

Number of additional probes for multiprobe LSH; if 0, traditional LSH is used. Default value "0" (integer).

projections

The number of hash functions for each tabl. Default value "10" (integer).

query

Matrix containing query points (optional) (numeric matrix).

reference

Matrix containing the reference dataset (numeric matrix).

second_hash_size

The size of the second level hash table. Default value "99901" (integer).

seed

Random seed. If 0, 'std::time(NULL)' is used. Default value "0" (integer).

tables

The number of hash tables to be used. Default value "30" (integer).

true_neighbors

Matrix of true neighbors to compute recall with (the recall is printed when -v is specified) (integer matrix).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "FALSE" (logical).

Details

This program will calculate the k approximate-nearest-neighbors of a set of points using locality-sensitive hashing. You may specify a separate set of reference points and query points, or just a reference set which will be used as both the reference and query set.

Value

A list with several components:

distances

Matrix to output distances into (numeric matrix).

neighbors

Matrix to output neighbors into (integer matrix).

output_model

Output for trained LSH model (LSHSearch).

Author(s)

mlpack developers

Examples

# For example, the following will return 5 neighbors from the data for each
# point in "input" and store the distances in "distances" and the neighbors
# in "neighbors":

## Not run: 
output <- lsh(k=5, reference=input)
distances <- output$distances
neighbors <- output$neighbors

## End(Not run)

# The output is organized such that row i and column j in the neighbors
# output corresponds to the index of the point in the reference set which is
# the j'th nearest neighbor from the point in the query set with index i. 
# Row j and column i in the distances output file corresponds to the distance
# between those two points.
# 
# Because this is approximate-nearest-neighbors search, results may be
# different from run to run.  Thus, the "seed" parameter can be specified to
# set the random seed.
# 
# This program also has many other parameters to control its functionality;
# see the parameter-specific documentation for more information.

mlpack documentation built on Sept. 27, 2023, 1:07 a.m.

Related to lsh in mlpack...