rnndescent-package: rnndescent: Nearest Neighbor Descent Method for Approximate...

rnndescent-packageR Documentation

rnndescent: Nearest Neighbor Descent Method for Approximate Nearest Neighbors

Description

The Nearest Neighbor Descent method for finding approximate nearest neighbors by Dong and co-workers (2010) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1963405.1963487")}. Based on the 'Python' package 'PyNNDescent' https://github.com/lmcinnes/pynndescent.

Details

The rnndescent package provides functions to create approximate nearest neighbors using the Nearest Neighbor Descent (Dong and co-workers, 2010) and Random Partition Tree (Dasgupta and Freund, 2008) methods. In comparison to other packages, it offers more metrics and can be used with sparse matrices. For querying new data, it uses graph diversification methods (Harwood and Drummond, 2016) and back-tracking (Iwasakai and Miyazaki, 2018) to improve the search performance. The package also provides functions to diagnose hubness in nearest neighbor results (Radovanovic and co-workers, 2010).

This library is based heavily on the 'PyNNDescent' Python library.

General resources:

The following functions provide the main interface to the package, with useful defaults:

  • Find the approximate nearest neighbors: rnnd_knn()

  • Create a search index and query new neighbors: rnnd_build().

  • Query new neighbors (or refine an existing knn graph): rnnd_query().

Some diagnostic and helper functions to help explore the the structure of the graphs and how well the approximation is working:

  • Find exact nearest neighbors: brute_force_knn(), brute_force_knn_query().

  • Merging graphs: merge_knn().

  • Hubness: k_occur().

  • Overlap/accuracy of two neighbor graphs: neighbor_overlap().

Some lower-level functions are also available if you want more control than the ⁠rnnd_*⁠ functions provide:

  • Find approximate nearest neighbors: rpf_knn(), nnd_knn().

  • Generating random neighbors: random_knn(), random_knn_query().

  • Building an index: rpf_build(), rpf_filter().

  • Querying an index for new data: rpf_knn_query(), prepare_search_graph(), graph_knn_query().

Author(s)

Maintainer: James Melville jlmelville@gmail.com [copyright holder]

Other contributors:

  • Vitalie Spinu [contributor]

  • Ralf Stubner [contributor]

References

Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537-546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.

Dong, W., Moses, C., & Li, K. (2011, March). Efficient k-nearest neighbor graph construction for generic similarity measures. In Proceedings of the 20th international conference on World Wide Web (pp. 577-586). ACM. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1963405.1963487")}.

Harwood, B., & Drummond, T. (2016). Fanng: Fast approximate nearest neighbour graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5713-5722).

Radovanovic, M., Nanopoulos, A., & Ivanovic, M. (2010). Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11, 2487-2531. https://www.jmlr.org/papers/v11/radovanovic10a.html

Iwasaki, M., & Miyazaki, D. (2018). Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. arXiv preprint arXiv:1810.07355. https://arxiv.org/abs/1810.07355

See Also

Useful links:


jlmelville/rnndescent documentation built on April 19, 2024, 8:26 p.m.