nnd_knn  R Documentation 
Uses the Nearest Neighbor Descent method due to Dong and coworkers (2011) to optimize an approximate nearest neighbor graph.
nnd_knn(
data,
k = NULL,
metric = "euclidean",
init = "rand",
init_args = NULL,
n_iters = NULL,
max_candidates = NULL,
delta = 0.001,
low_memory = TRUE,
weight_by_degree = FALSE,
use_alt_metric = TRUE,
n_threads = 0,
verbose = FALSE,
progress = "bar",
obs = "R",
ret_forest = FALSE
)
data 
Matrix of 
k 
Number of nearest neighbors to return. Optional if 
metric 
Type of distance calculation to use. One of:
For nonsparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the nonpreprocessed versions:
For nonsparse binary data passed as a

init 
Name of the initialization strategy or initial
If

init_args 
a list containing arguments to pass to the random partition
forest initialization. See 
n_iters 
Number of iterations of nearest neighbor descent to carry out.
By default, this will be chosen based on the number of observations in

max_candidates 
Maximum number of candidate neighbors to try for each
item in each iteration. Use relative to 
delta 
The minimum relative change in the neighbor graph allowed before
early stopping. Should be a value between 0 and 1. The smaller the value,
the smaller the amount of progress between iterations is allowed. Default
value of 
low_memory 
If 
weight_by_degree 
If 
use_alt_metric 
If 
n_threads 
Number of threads to use. 
verbose 
If 
progress 
Determines the type of progress information logged if

obs 
set to 
ret_forest 
If 
If no initial graph is provided, a random graph is generated, or you may also specify the use of a graph generated from a forest of random projection trees, using the method of Dasgupta and Freund (2008).
the approximate nearest neighbor graph as a list containing:
idx
an n by k matrix containing the nearest neighbor indices.
dist
an n by k matrix containing the nearest neighbor distances.
forest
(if init = "tree"
and ret_forest = TRUE
only): the RP forest
used to initialize the neighbor data.
Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.
Dong, W., Moses, C., & Li, K. (2011, March). Efficient knearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World Wide Web (pp. 577586). ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1963405.1963487")}.
# Find 4 (approximate) nearest neighbors using Euclidean distance
# If you pass a data frame, nonnumeric columns are removed
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean")
# Manhattan (l1) distance
iris_nn < nnd_knn(iris, k = 4, metric = "manhattan")
# Multithreading: you can choose the number of threads to use: in real
# usage, you will want to set n_threads to at least 2
iris_nn < nnd_knn(iris, k = 4, metric = "manhattan", n_threads = 1)
# Use verbose flag to see information about progress
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean", verbose = TRUE)
# Nearest neighbor descent uses random initialization, but you can pass any
# approximation using the init argument (as long as the metrics used to
# calculate the initialization are compatible with the metric options used
# by nnd_knn).
iris_nn < random_knn(iris, k = 4, metric = "euclidean")
iris_nn < nnd_knn(iris, init = iris_nn, metric = "euclidean", verbose = TRUE)
# Number of iterations controls how much optimization is attempted. A smaller
# value will run faster but give poorer results
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean", n_iters = 2)
# You can also control the amount of work done within an iteration by
# setting max_candidates
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean", max_candidates = 50)
# Optimization may also stop early if not much progress is being made. This
# convergence criterion can be controlled via delta. A larger value will
# stop progress earlier. The verbose flag will provide some information if
# convergence is occurring before all iterations are carried out.
set.seed(1337)
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean", n_iters = 5, delta = 0.5)
# To ensure that descent only stops if no improvements are made, set delta = 0
set.seed(1337)
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean", n_iters = 5, delta = 0)
# A faster version of the algorithm is available that avoids repeated
# distance calculations at the cost of using more RAM. Set low_memory to
# FALSE to try it.
set.seed(1337)
iris_nn < nnd_knn(iris, k = 4, metric = "euclidean", low_memory = FALSE)
# Using init = "tree" is usually more efficient than random initialization.
# arguments to the tree initialization method can be passed via the init_args
# list
set.seed(1337)
iris_nn < nnd_knn(iris, k = 4, init = "tree", init_args = list(n_trees = 5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.