rpf_build | R Documentation |
Builds a "forest" of Random Projection Trees (Dasgupta and Freund, 2008), which can later be searched to find approximate nearest neighbors.
rpf_build(
data,
metric = "euclidean",
use_alt_metric = TRUE,
n_trees = NULL,
leaf_size = 10,
max_tree_depth = 200,
margin = "auto",
n_threads = 0,
verbose = FALSE,
obs = "R"
)
data |
Matrix of |
metric |
Type of distance calculation to use. One of:
For non-sparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the non-preprocessed versions:
For non-sparse binary data passed as a
Note that if |
use_alt_metric |
If |
n_trees |
The number of trees to use in the RP forest. A larger number
will give more accurate results at the cost of a longer computation time.
The default of |
leaf_size |
The maximum number of items that can appear in a leaf. This
value should be chosen to match the expected number of neighbors you will
want to retrieve when running queries (e.g. if you want find 50 nearest
neighbors set |
max_tree_depth |
The maximum depth of the tree to build (default = 200).
If the maximum tree depth is exceeded then the leaf size of a tree may
exceed |
margin |
A character string specifying the method used to assign points to one side of the hyperplane or the other. Possible values are:
|
n_threads |
Number of threads to use. |
verbose |
If |
obs |
set to |
a forest of random projection trees as a list. Each tree in the
forest is a further list, but is not intended to be examined or manipulated
by the user. As a normal R data type, it can be safely serialized and
deserialized with base::saveRDS()
and base::readRDS()
. To use it for
querying pass it as the forest
parameter of rpf_knn_query()
. The forest
does not store any of the data
passed into build the tree, so if you
are going to search the forest, you will also need to store the data
used
to build it and provide it during the search.
Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537-546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.
rpf_knn_query()
# Build a forest of 10 trees from the odd rows
iris_odd <- iris[seq_len(nrow(iris)) %% 2 == 1, ]
iris_odd_forest <- rpf_build(iris_odd, n_trees = 10)
iris_even <- iris[seq_len(nrow(iris)) %% 2 == 0, ]
iris_even_nn <- rpf_knn_query(
query = iris_even, reference = iris_odd,
forest = iris_odd_forest, k = 15
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.