rpf_knn_query: Query a random projection forest index for nearest neighbors
In jlmelville/rnndescent: Nearest Neighbor Descent Method for Approximate Nearest Neighbors

rpf_knn_query

R Documentation

Query a random projection forest index for nearest neighbors

Description

Run queries against a "forest" of Random Projection Trees (Dasgupta and Freund, 2008), to return nearest neighbors taken from the reference data used to build the forest.

Usage

rpf_knn_query(
  query,
  reference,
  forest,
  k,
  cache = TRUE,
  n_threads = 0,
  verbose = FALSE,
  obs = "R"
)

Arguments

`query`	Matrix of `n` query items, with observations in the rows and features in the columns. Optionally, the data may be passed with the observations in the columns, by setting `obs = "C"`, which should be more efficient. The `reference` data must be passed in the same orientation as `query`. Possible formats are `base::data.frame()`, `base::matrix()` or `Matrix::sparseMatrix()`. Sparse matrices should be in `dgCMatrix` format. Dataframes will be converted to `numerical` matrix format internally, so if your data columns are `logical` and intended to be used with the specialized binary `metric`s, you should convert it to a logical matrix first (otherwise you will get the slower dense numerical version).
`reference`	Matrix of `m` reference items, with observations in the rows and features in the columns. The nearest neighbors to the queries are calculated from this data and should be the same data used to build the `forest`. Optionally, the data may be passed with the observations in the columns, by setting `obs = "C"`, which should be more efficient. The `query` data must be passed in the same format and orientation as `reference`. Possible formats are `base::data.frame()`, `base::matrix()` or `Matrix::sparseMatrix()`. Sparse matrices should be in `dgCMatrix` format.
`forest`	A random partition forest, created by `rpf_build()`, representing partitions of the data in `reference`.
`k`	Number of nearest neighbors to return. You are unlikely to get good results if you choose a value substantially larger than the value of `leaf_size` used to build the `forest`.
`cache`	if `TRUE` (the default) then candidate indices found in the leaves of the forest are cached to avoid recalculating the same distance repeatedly. This incurs an extra memory cost which scales with `n_threads`. Set this to `FALSE` to disable distance caching.
`n_threads`	Number of threads to use. Note that the parallelism in the search is done over the observations in `query` not the trees in the `forest`. Thus a single observation will not see any speed-up from increasing `n_threads`.
`verbose`	If `TRUE`, log information to the console.
`obs`	set to `"C"` to indicate that the input `data` orientation stores each observation as a column. The default `"R"` means that observations are stored in each row. Storing the data by row is usually more convenient, but internally your data will be converted to column storage. Passing it already column-oriented will save some memory and (a small amount of) CPU usage.

Value

the approximate nearest neighbor graph as a list containing:

idx an n by k matrix containing the nearest neighbor indices.
dist an n by k matrix containing the nearest neighbor distances.

k neighbors per observation are not guaranteed to be found. Missing data is represented with an index of 0 and a distance of NA.

References

Dasgupta, S., & Freund, Y. (2008, May). Random projection trees and low dimensional manifolds. In Proceedings of the fortieth annual ACM symposium on Theory of computing (pp. 537-546). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1374376.1374452")}.

Examples

# Build a forest of 10 trees from the odd rows
iris_odd <- iris[seq_len(nrow(iris)) %% 2 == 1, ]
iris_odd_forest <- rpf_build(iris_odd, n_trees = 10)

iris_even <- iris[seq_len(nrow(iris)) %% 2 == 0, ]
iris_even_nn <- rpf_knn_query(
  query = iris_even, reference = iris_odd,
  forest = iris_odd_forest, k = 15
)

jlmelville/rnndescent documentation built on July 29, 2024, 9:31 p.m.

jlmelville/rnndescent index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jlmelville/rnndescent
Nearest Neighbor Descent Method for Approximate Nearest Neighbors

rpf_knn_query: Query a random projection forest index for nearest neighbors
In jlmelville/rnndescent: Nearest Neighbor Descent Method for Approximate Nearest Neighbors

Query a random projection forest index for nearest neighbors

Description

Usage

Arguments

Value

References

See Also

Examples

Related to rpf_knn_query in jlmelville/rnndescent...

R Package Documentation

Browse R Packages

We want your feedback!

jlmelville/rnndescent Nearest Neighbor Descent Method for Approximate Nearest Neighbors

rpf_knn_query: Query a random projection forest index for nearest neighbors In jlmelville/rnndescent: Nearest Neighbor Descent Method for Approximate Nearest Neighbors

Query a random projection forest index for nearest neighbors

Description

Usage

Arguments

Value

References

See Also

Examples

Related to rpf_knn_query in jlmelville/rnndescent...

R Package Documentation

Browse R Packages

We want your feedback!

jlmelville/rnndescent
Nearest Neighbor Descent Method for Approximate Nearest Neighbors

rpf_knn_query: Query a random projection forest index for nearest neighbors
In jlmelville/rnndescent: Nearest Neighbor Descent Method for Approximate Nearest Neighbors