prepare_search_graph | R Documentation |
Create a graph using existing nearest neighbor data to balance search speed and accuracy using the occlusion pruning and truncation strategies of Harwood and Drummond (2016). The resulting search graph should be more efficient for querying new data than the original nearest neighbor graph.
prepare_search_graph(
data,
graph,
metric = "euclidean",
use_alt_metric = TRUE,
diversify_prob = 1,
pruning_degree_multiplier = 1.5,
prune_reverse = FALSE,
n_threads = 0,
verbose = FALSE,
obs = "R"
)
data |
Matrix of |
graph |
neighbor graph for
|
metric |
Type of distance calculation to use. One of:
For non-sparse data, the following variants are available with preprocessing: this trades memory for a potential speed up during the distance calculation. Some minor numerical differences should be expected compared to the non-preprocessed versions:
For non-sparse binary data passed as a
|
use_alt_metric |
If |
diversify_prob |
the degree of diversification of the search graph
by removing unnecessary edges through occlusion pruning. This should take a
value between |
pruning_degree_multiplier |
How strongly to truncate the final neighbor
list for each item. The neighbor list of each item will be truncated to
retain only the closest |
prune_reverse |
If |
n_threads |
Number of threads to use. |
verbose |
If |
obs |
set to |
An approximate nearest neighbor graph is not very useful for querying via
graph_knn_query()
, especially if the query data is initialized randomly:
some items in the data set may not be in the nearest neighbor list of any
other item and can therefore never be returned as a neighbor, no matter how
close they are to the query. Even those which do appear in at least one
neighbor list may not be reachable by expanding an arbitrary starting list if
the neighbor graph contains disconnected components.
Converting the directed graph represented by the neighbor graph to an
undirected graph by adding an edge from item j
to i
if
an edge exists from i
to j
(i.e. creating the mutual neighbor
graph) solves the problems above, but can result in inefficient searches.
Although the out-degree of each item is restricted to the number of neighbors
the in-degree has no such restrictions: a given item could be very "popular"
and in a large number of neighbors lists. Therefore mutualizing the neighbor
graph can result in some items with a large number of neighbors to search.
These usually have very similar neighborhoods so there is nothing to be
gained from searching all of them.
To balance accuracy and search time, the following procedure is carried out:
The graph is "diversified" by occlusion pruning.
The reverse graph is formed by reversing the direction of all edges in the pruned graph.
The reverse graph is diversified by occlusion pruning.
The pruned forward and pruned reverse graph are merged.
The outdegree of each node in the merged graph is truncated.
The truncated merged graph is returned as the prepared search graph.
Explicit zero distances in the graph
will be converted to a small positive
number to avoid being dropped in the sparse representation. The one exception
is the "self" distance, i.e. any edge in the graph
which links a node to
itself (the diagonal of the sparse distance matrix). These trivial edges
aren't useful for search purposes and are always dropped.
a search graph for data
based on graph
, represented as a sparse
matrix, suitable for use with graph_knn_query()
.
Harwood, B., & Drummond, T. (2016). Fanng: Fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5713-5722).
Jayaram Subramanya, S., Devvrit, F., Simhadri, H. V., Krishnawamy, R., & Kadekodi, R. (2019). Diskann: Fast accurate billion-point nearest neighbor search on a single node. Advances in Neural Information Processing Systems, 32.
graph_knn_query()
# 100 reference iris items
iris_ref <- iris[iris$Species %in% c("setosa", "versicolor"), ]
# 50 query items
iris_query <- iris[iris$Species == "versicolor", ]
# First, find the approximate 4-nearest neighbor graph for the references:
ref_ann_graph <- nnd_knn(iris_ref, k = 4)
# Create a graph for querying with
ref_search_graph <- prepare_search_graph(iris_ref, ref_ann_graph)
# Using the search graph rather than the ref_ann_graph directly may give
# more accurate or faster results
iris_query_nn <- graph_knn_query(
query = iris_query, reference = iris_ref,
reference_graph = ref_search_graph, k = 4, metric = "euclidean",
verbose = TRUE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.