rpf_knn | R Documentation |
Returns the approximate k-nearest neighbor graph of a dataset by searching multiple random projection trees, a variant of k-d trees originated by Dasgupta and Freund (2008).
rpf_knn(
data,
k,
metric = "euclidean",
use_alt_metric = TRUE,
n_trees = NULL,
leaf_size = NULL,
max_tree_depth = 200,
include_self = TRUE,
ret_forest = FALSE,
margin = "auto",
n_threads = 0,
verbose = FALSE,
obs = "R"
)
data |
Matrix of |
k |
Number of nearest neighbors to return. Optional if |
metric |
Type of distance calculation to use. One of:
For non-sparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the non-preprocessed versions:
For non-sparse binary data passed as a
Note that if |
use_alt_metric |
If |
n_trees |
The number of trees to use in the RP forest. A larger number
will give more accurate results at the cost of a longer computation time.
The default of |
leaf_size |
The maximum number of items that can appear in a leaf. The
default of |
max_tree_depth |
The maximum depth of the tree to build (default = 200).
If the maximum tree depth is exceeded then the leaf size of a tree may
exceed |
include_self |
If |
ret_forest |
If |
margin |
A character string specifying the method used to assign points to one side of the hyperplane or the other. Possible values are:
|
n_threads |
Number of threads to use. |
verbose |
If |
obs |
set to |
the approximate nearest neighbor graph as a list containing:
idx
an n by k matrix containing the nearest neighbor indices.
dist
an n by k matrix containing the nearest neighbor distances.
forest
(if ret_forest = TRUE
) the RP forest that generated the
neighbor graph, which can be used to query new data.
k
neighbors per observation are not guaranteed to be found. Missing data
is represented with an index of 0
and a distance of NA
.
Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537-546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.
rpf_filter()
, nnd_knn()
# Find 4 (approximate) nearest neighbors using Euclidean distance
# If you pass a data frame, non-numeric columns are removed
iris_nn <- rpf_knn(iris, k = 4, metric = "euclidean", leaf_size = 3)
# If you want to initialize another method (e.g. nearest neighbor descent)
# with the result of the RP forest, then it's more efficient to skip
# evaluating whether an item is a neighbor of itself by setting
# `include_self = FALSE`:
iris_rp <- rpf_knn(iris, k = 4, n_trees = 3, include_self = FALSE)
# for future querying you may want to also return the RP forest:
iris_rpf <- rpf_knn(iris,
k = 4, n_trees = 3, include_self = FALSE,
ret_forest = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.