graph_knn_query | R Documentation |
Run queries against a search graph, to return nearest neighbors taken from the reference data used to build that graph.
graph_knn_query(
query,
reference,
reference_graph,
k = NULL,
metric = "euclidean",
init = NULL,
epsilon = 0.1,
max_search_fraction = 1,
use_alt_metric = TRUE,
n_threads = 0,
verbose = FALSE,
obs = "R"
)
query |
Matrix of |
reference |
Matrix of |
reference_graph |
Search graph of the |
k |
Number of nearest neighbors to return. Optional if |
metric |
Type of distance calculation to use. One of:
For non-sparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the non-preprocessed versions:
For non-sparse binary data passed as a
|
init |
Initial
|
epsilon |
Controls trade-off between accuracy and search cost, as
described by Iwasaki and Miyazaki (2018), by specifying a distance
tolerance on whether to explore the neighbors of candidate points. The
larger the value, the more neighbors will be searched. A value of 0.1
allows query-candidate distances to be 10% larger than the current
most-distant neighbor of the query point, 0.2 means 20%, and so on.
Suggested values are between 0-0.5, although this value is highly dependent
on the distribution of distances in the dataset (higher dimensional data
should choose a smaller cutoff). Too large a value of |
max_search_fraction |
Maximum fraction of the reference data to search.
This is a value between 0 (search none of the reference data) and 1 (search
all of the data if necessary). This works in conjunction with |
use_alt_metric |
If |
n_threads |
Number of threads to use. |
verbose |
If |
obs |
set to |
A greedy beam search is used to query the graph, combining two search pruning
strategies. The first, due to Iwasaki and Miyazaki (2018), only considers
new candidates within a relative distance of the current furthest neighbor
in the query's graph. The second, due to Harwood and Drummond (2016), puts a
limit on the absolute number of distance calculations to carry out. See the
epsilon
and max_search_fraction
parameters respectively.
the approximate nearest neighbor graph as a list containing:
idx
a n
by k
matrix containing the nearest neighbor indices
specifying the row of the neighbor in reference
.
dist
a n
by k
matrix containing the nearest neighbor distances.
Harwood, B., & Drummond, T. (2016). Fanng: Fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5713-5722).
Iwasaki, M., & Miyazaki, D. (2018). Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. arXiv preprint arXiv:1810.07355.
# 100 reference iris items
iris_ref <- iris[iris$Species %in% c("setosa", "versicolor"), ]
# 50 query items
iris_query <- iris[iris$Species == "versicolor", ]
# First, find the approximate 4-nearest neighbor graph for the references:
iris_ref_graph <- nnd_knn(iris_ref, k = 4)
# For each item in iris_query find the 4 nearest neighbors in iris_ref.
# You need to pass both the reference data and the reference graph.
# If you pass a data frame, non-numeric columns are removed.
# set verbose = TRUE to get details on the progress being made
iris_query_nn <- graph_knn_query(iris_query, iris_ref, iris_ref_graph,
k = 4, metric = "euclidean", verbose = TRUE
)
# A more complete example, converting the initial knn into a search graph
# and using a filtered random projection forest to initialize the search
# create initial knn and forest
iris_ref_graph <- nnd_knn(iris_ref, k = 4, init = "tree", ret_forest = TRUE)
# keep the best tree in the forest
forest <- rpf_filter(iris_ref_graph, n_trees = 1)
# expand the knn into a search graph
iris_ref_search_graph <- prepare_search_graph(iris_ref, iris_ref_graph)
# run the query with the improved graph and initialization
iris_query_nn <- graph_knn_query(iris_query, iris_ref, iris_ref_search_graph,
init = forest, k = 4
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.