rnnd_knn | R Documentation |
This function builds an approximate nearest neighbors graph of the provided data using convenient defaults. It does not return an index for later querying, to speed the graph construction and reduce the size and complexity of the return value.
rnnd_knn(
data,
k = 30,
metric = "euclidean",
use_alt_metric = TRUE,
init = "tree",
n_trees = NULL,
leaf_size = NULL,
max_tree_depth = 200,
margin = "auto",
n_iters = NULL,
delta = 0.001,
max_candidates = NULL,
weight_by_degree = FALSE,
low_memory = TRUE,
n_threads = 0,
verbose = FALSE,
progress = "bar",
obs = "R"
)
data |
Matrix of |
k |
Number of nearest neighbors to return. Optional if |
metric |
Type of distance calculation to use. One of:
For non-sparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the non-preprocessed versions:
For non-sparse binary data passed as a
|
use_alt_metric |
If |
init |
Name of the initialization strategy or initial
If
|
n_trees |
The number of trees to use in the RP forest. A larger number
will give more accurate results at the cost of a longer computation time.
The default of |
leaf_size |
The maximum number of items that can appear in a leaf. This
value should be chosen to match the expected number of neighbors you will
want to retrieve when running queries (e.g. if you want find 50 nearest
neighbors set |
max_tree_depth |
The maximum depth of the tree to build (default = 200).
If the maximum tree depth is exceeded then the leaf size of a tree may
exceed |
margin |
A character string specifying the method used to assign points to one side of the hyperplane or the other. Possible values are:
Only used if |
n_iters |
Number of iterations of nearest neighbor descent to carry out.
By default, this will be chosen based on the number of observations in
|
delta |
The minimum relative change in the neighbor graph allowed before
early stopping. Should be a value between 0 and 1. The smaller the value,
the smaller the amount of progress between iterations is allowed. Default
value of |
max_candidates |
Maximum number of candidate neighbors to try for each
item in each iteration. Use relative to |
weight_by_degree |
If |
low_memory |
If |
n_threads |
Number of threads to use. |
verbose |
If |
progress |
Determines the type of progress information logged during the
nearest neighbor descent stage when
|
obs |
set to |
The process of k-nearest neighbor graph construction using Random Projection
Forests (Dasgupta and Freund, 2008) for initialization and Nearest Neighbor
Descent (Dong and co-workers, 2011) for refinement. If you are sure you will
not want to query new data then compared to rnnd_build()
this function has
the advantage of not storing the index, which can be very large.
the approximate nearest neighbor index, a list containing:
idx
an n by k matrix containing the nearest neighbor indices.
dist
an n by k matrix containing the nearest neighbor distances.
Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537-546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.
Dong, W., Moses, C., & Li, K. (2011, March). Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World Wide Web (pp. 577-586). ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1963405.1963487")}.
rnnd_build()
, rnnd_query()
# Find 4 (approximate) nearest neighbors using Euclidean distance
iris_knn <- rnnd_knn(iris, k = 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.