merge_knn: Merge multiple approximate nearest neighbors graphs
In rnndescent: Nearest Neighbor Descent Method for Approximate Nearest Neighbors

merge_knn

R Documentation

Merge multiple approximate nearest neighbors graphs

Description

merge_knn takes a list of nearest neighbor graphs and merges them into a single graph, with the same number of neighbors as the first graph. This is useful to combine the results of multiple different nearest neighbor searches: the output will be at least as accurate as the most accurate of the two input graphs, and ideally will be more accurate than either.

Usage

merge_knn(graphs, is_query = FALSE, n_threads = 0, verbose = FALSE)

Arguments

`graphs`	A list of nearest neighbor graphs to merge. Each item in the list should consist of a sub-list containing: `idx` an n by k matrix containing the k nearest neighbor indices. `dist` an n by k matrix containing k nearest neighbor distances. The number of neighbors can differ between graphs, but the merged result will have the same number of neighbors as the first graph in the list.
`is_query`	If `TRUE` then the graphs are treated as the result of a knn query, not a knn building process. Or: is the graph bipartite? This should be set to `TRUE` if `nn_graphs` are the results of using e.g. `graph_knn_query()` or `random_knn_query()`, and set to `FALSE` if these are the results of `nnd_knn()` or `random_knn()`. The difference is that if `is_query = FALSE`, if an index `p` is found in `nn_graph1[i, ]`, i.e. `p` is a neighbor of `i` with distance `d`, then it is assumed that `i` is a neighbor of `p` with the same distance. If `is_query = TRUE`, then `i` and `p` are indexes into two different datasets and the symmetry does not hold. If you aren't sure what case applies to you, it's safe (but potentially inefficient) to set `is_query = TRUE`.
`n_threads`	Number of threads to use.
`verbose`	If `TRUE`, log information to the console.

Value

a list containing:

idx an n by k matrix containing the merged nearest neighbor indices.
dist an n by k matrix containing the merged nearest neighbor distances.

The size of k in the output graph is the same as that of the first item in nn_graphs.

Examples

set.seed(1337)
# Nearest neighbor descent with 15 neighbors for iris three times,
# starting from a different random initialization each time
iris_rnn1 <- nnd_knn(iris, k = 15, n_iters = 1)
iris_rnn2 <- nnd_knn(iris, k = 15, n_iters = 1)
iris_rnn3 <- nnd_knn(iris, k = 15, n_iters = 1)

# Merged results should be an improvement over individual results
iris_mnn <- merge_knn(list(iris_rnn1, iris_rnn2, iris_rnn3))
sum(iris_mnn$dist) < sum(iris_rnn1$dist)
sum(iris_mnn$dist) < sum(iris_rnn2$dist)
sum(iris_mnn$dist) < sum(iris_rnn3$dist)

rnndescent documentation built on May 29, 2024, 8:38 a.m.