krann: K-Rank-Approximate-Nearest-Neighbors (kRANN)

View source: R/krann.R

krannR Documentation

K-Rank-Approximate-Nearest-Neighbors (kRANN)

Description

An implementation of rank-approximate k-nearest-neighbor search (kRANN) using single-tree and dual-tree algorithms. Given a set of reference points and query points, this can find the k nearest neighbors in the reference set of each query point using trees; trees that are built can be saved for future use.

Usage

krann(
  alpha = NA,
  first_leaf_exact = FALSE,
  input_model = NA,
  k = NA,
  leaf_size = NA,
  naive = FALSE,
  query = NA,
  random_basis = FALSE,
  reference = NA,
  sample_at_leaves = FALSE,
  seed = NA,
  single_mode = FALSE,
  single_sample_limit = NA,
  tau = NA,
  tree_type = NA,
  verbose = getOption("mlpack.verbose", FALSE)
)

Arguments

alpha

The desired success probability. Default value "0.95" (numeric).

first_leaf_exact

The flag to trigger sampling only after exactly exploring the first leaf. Default value "FALSE" (logical).

input_model

Pre-trained kNN model (RAModel).

k

Number of nearest neighbors to find. Default value "0" (integer).

leaf_size

Leaf size for tree building (used for kd-trees, UB trees, R trees, R* trees, X trees, Hilbert R trees, R+ trees, R++ trees, and octrees). Default value "20" (integer).

naive

If true, sampling will be done without using a tree. Default value "FALSE" (logical).

query

Matrix containing query points (optional) (numeric matrix).

random_basis

Before tree-building, project the data onto a random orthogonal basis. Default value "FALSE" (logical).

reference

Matrix containing the reference dataset (numeric matrix).

sample_at_leaves

The flag to trigger sampling at leaves. Default value "FALSE" (logical).

seed

Random seed (if 0, std::time(NULL) is used). Default value "0" (integer).

single_mode

If true, single-tree search is used (as opposed to dual-tree search. Default value "FALSE" (logical).

single_sample_limit

The limit on the maximum number of samples (and hence the largest node you can approximate). Default value "20" (integer).

tau

The allowed rank-error in terms of the percentile of the data. Default value "5" (numeric).

tree_type

Type of tree to use: 'kd', 'ub', 'cover', 'r', 'x', 'r-star', 'hilbert-r', 'r-plus', 'r-plus-plus', 'oct'. Default value "kd" (character).

verbose

Display informational messages and the full list of parameters and timers at the end of execution. Default value "getOption("mlpack.verbose", FALSE)" (logical).

Details

This program will calculate the k rank-approximate-nearest-neighbors of a set of points. You may specify a separate set of reference points and query points, or just a reference set which will be used as both the reference and query set. You must specify the rank approximation (in success probability).

Value

A list with several components:

distances

Matrix to output distances into (numeric matrix).

neighbors

Matrix to output neighbors into (integer matrix).

output_model

If specified, the kNN model will be output here (RAModel).

Author(s)

mlpack developers

Examples

# For example, the following will return 5 neighbors from the top 0.1% of the
# data (with probability 0.95) for each point in "input" and store the
# distances in "distances" and the neighbors in "neighbors.csv":

## Not run: 
output <- krann(reference=input, k=5, tau=0.1)
distances <- output$distances
neighbors <- output$neighbors

## End(Not run)

# Note that tau must be set such that the number of points in the
# corresponding percentile of the data is greater than k.  Thus, if we choose
# tau = 0.1 with a dataset of 1000 points and k = 5, then we are attempting
# to choose 5 nearest neighbors out of the closest 1 point -- this is invalid
# and the program will terminate with an error message.
# 
# The output matrices are organized such that row i and column j in the
# neighbors output file corresponds to the index of the point in the
# reference set which is the i'th nearest neighbor from the point in the
# query set with index j.  Row i and column j in the distances output file
# corresponds to the distance between those two points.

mlpack documentation built on Oct. 5, 2024, 9:08 a.m.

Related to krann in mlpack...