krann: K-Rank-Approximate-Nearest-Neighbors (kRANN)
In mlpack: 'Rcpp' Integration for the 'mlpack' Library

krann

R Documentation

K-Rank-Approximate-Nearest-Neighbors (kRANN)

Description

An implementation of rank-approximate k-nearest-neighbor search (kRANN) using single-tree and dual-tree algorithms. Given a set of reference points and query points, this can find the k nearest neighbors in the reference set of each query point using trees; trees that are built can be saved for future use.

Usage

krann(
  alpha = NA,
  first_leaf_exact = FALSE,
  input_model = NA,
  k = NA,
  leaf_size = NA,
  naive = FALSE,
  query = NA,
  random_basis = FALSE,
  reference = NA,
  sample_at_leaves = FALSE,
  seed = NA,
  single_mode = FALSE,
  single_sample_limit = NA,
  tau = NA,
  tree_type = NA,
  verbose = getOption("mlpack.verbose", FALSE)
)

Arguments

`alpha`	The desired success probability. Default value "0.95" (numeric).
`first_leaf_exact`	The flag to trigger sampling only after exactly exploring the first leaf. Default value "FALSE" (logical).
`input_model`	Pre-trained kNN model (RAModel).
`k`	Number of nearest neighbors to find. Default value "0" (integer).
`leaf_size`	Leaf size for tree building (used for kd-trees, UB trees, R trees, R* trees, X trees, Hilbert R trees, R+ trees, R++ trees, and octrees). Default value "20" (integer).
`naive`	If true, sampling will be done without using a tree. Default value "FALSE" (logical).
`query`	Matrix containing query points (optional) (numeric matrix).
`random_basis`	Before tree-building, project the data onto a random orthogonal basis. Default value "FALSE" (logical).
`reference`	Matrix containing the reference dataset (numeric matrix).
`sample_at_leaves`	The flag to trigger sampling at leaves. Default value "FALSE" (logical).
`seed`	Random seed (if 0, std::time(NULL) is used). Default value "0" (integer).
`single_mode`	If true, single-tree search is used (as opposed to dual-tree search. Default value "FALSE" (logical).
`single_sample_limit`	The limit on the maximum number of samples (and hence the largest node you can approximate). Default value "20" (integer).
`tau`	The allowed rank-error in terms of the percentile of the data. Default value "5" (numeric).
`tree_type`	Type of tree to use: 'kd', 'ub', 'cover', 'r', 'x', 'r-star', 'hilbert-r', 'r-plus', 'r-plus-plus', 'oct'. Default value "kd" (character).
`verbose`	Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).

Details

This program will calculate the k rank-approximate-nearest-neighbors of a set of points. You may specify a separate set of reference points and query points, or just a reference set which will be used as both the reference and query set. You must specify the rank approximation (in success probability).

Value

A list with several components:

`distances`	Matrix to output distances into (numeric matrix).
`neighbors`	Matrix to output neighbors into (integer matrix).
`output_model`	If specified, the kNN model will be output here (RAModel).

Author(s)

mlpack developers

Examples

# For example, the following will return 5 neighbors from the top 0.1% of the
# data (with probability 0.95) for each point in "input" and store the
# distances in "distances" and the neighbors in "neighbors.csv":

## Not run: 
output <- krann(reference=input, k=5, tau=0.1)
distances <- output$distances
neighbors <- output$neighbors

## End(Not run)

# Note that tau must be set such that the number of points in the
# corresponding percentile of the data is greater than k.  Thus, if we choose
# tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting
# to choose 5 nearest neighbors out of the closest 1 point -- this is invalid
# and the program will terminate with an error message.
# 
# The output matrices are organized such that row i and column j in the
# neighbors output file corresponds to the index of the point in the
# reference set which is the i'th nearest neighbor from the point in the
# query set with index j.  Row i and column j in the distances output file
# corresponds to the distance between those two points.

mlpack documentation built on June 8, 2025, 10:47 a.m.