rnnd_build | R Documentation |
This function builds an approximate nearest neighbors graph with convenient
defaults, then prepares the index for querying new data, for later use with
rnnd_query()
. For more control over the process, please see the other
functions in the package.
rnnd_build(
data,
k = 30,
metric = "euclidean",
use_alt_metric = TRUE,
init = "tree",
n_trees = NULL,
leaf_size = NULL,
max_tree_depth = 200,
margin = "auto",
n_iters = NULL,
delta = 0.001,
max_candidates = NULL,
low_memory = TRUE,
weight_by_degree = FALSE,
n_search_trees = 1,
pruning_degree_multiplier = 1.5,
diversify_prob = 1,
prune_reverse = FALSE,
n_threads = 0,
verbose = FALSE,
progress = "bar",
obs = "R"
)
data |
Matrix of |
k |
Number of nearest neighbors to build the index for. You can specify
a different number when running |
metric |
Type of distance calculation to use. One of:
For non-sparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the non-preprocessed versions:
For non-sparse binary data passed as a
|
use_alt_metric |
If |
init |
Name of the initialization strategy or initial
If
|
n_trees |
The number of trees to use in the RP forest. A larger number
will give more accurate results at the cost of a longer computation time.
The default of |
leaf_size |
The maximum number of items that can appear in a leaf. This
value should be chosen to match the expected number of neighbors you will
want to retrieve when running queries (e.g. if you want find 50 nearest
neighbors set |
max_tree_depth |
The maximum depth of the tree to build (default = 200).
If the maximum tree depth is exceeded then the leaf size of a tree may
exceed |
margin |
A character string specifying the method used to assign points to one side of the hyperplane or the other. Possible values are:
Only used if |
n_iters |
Number of iterations of nearest neighbor descent to carry out.
By default, this will be chosen based on the number of observations in
|
delta |
The minimum relative change in the neighbor graph allowed before
early stopping. Should be a value between 0 and 1. The smaller the value,
the smaller the amount of progress between iterations is allowed. Default
value of |
max_candidates |
Maximum number of candidate neighbors to try for each
item in each iteration. Use relative to |
low_memory |
If |
weight_by_degree |
If |
n_search_trees |
the number of trees to keep in the search forest as
part of index preparation. The default is |
pruning_degree_multiplier |
How strongly to truncate the final neighbor
list for each item. The neighbor list of each item will be truncated to
retain only the closest |
diversify_prob |
the degree of diversification of the search graph
by removing unnecessary edges through occlusion pruning. This should take a
value between |
prune_reverse |
If |
n_threads |
Number of threads to use. |
verbose |
If |
progress |
Determines the type of progress information logged during the
nearest neighbor descent stage when
|
obs |
set to |
The process of k-nearest neighbor graph construction using Random Projection Forests (Dasgupta and Freund, 2008) for initialization and Nearest Neighbor Descent (Dong and co-workers, 2011) for refinement. Index preparation, uses the graph diversification method of Harwood and Drummond (2016).
the approximate nearest neighbor index, a list containing:
graph
the k-nearest neighbor graph, a list containing:
idx
an n by k matrix containing the nearest neighbor indices.
dist
an n by k matrix containing the nearest neighbor distances.
Other list items are intended only for internal use by other functions
such as rnnd_query()
.
Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537-546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.
Dong, W., Moses, C., & Li, K. (2011, March). Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World Wide Web (pp. 577-586). ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1963405.1963487")}.
Harwood, B., & Drummond, T. (2016). Fanng: Fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5713-5722).
rnnd_query()
iris_even <- iris[seq_len(nrow(iris)) %% 2 == 0, ]
iris_odd <- iris[seq_len(nrow(iris)) %% 2 == 1, ]
# Find 4 (approximate) nearest neighbors using Euclidean distance
iris_even_index <- rnnd_build(iris_even, k = 4)
iris_odd_nbrs <- rnnd_query(index = iris_even_index, query = iris_odd, k = 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.