Description Usage Arguments Details Value Author(s) See Also Examples
Query a dataset for nearest neighbors of points in another dataset, using a variety of algorithms.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | queryExhaustive(X, query, k, get.index=TRUE, get.distance=TRUE, last=k,
BPPARAM=SerialParam(), precomputed=NULL, transposed=FALSE, subset=NULL,
raw.index=FALSE, warn.ties=TRUE, ...)
queryKmknn(X, query, k, get.index=TRUE, get.distance=TRUE, last=k,
BPPARAM=SerialParam(), precomputed=NULL, transposed=FALSE, subset=NULL,
raw.index=FALSE, warn.ties=TRUE, ...)
queryVptree(X, query, k, get.index=TRUE, get.distance=TRUE, last=k,
BPPARAM=SerialParam(), precomputed=NULL, transposed=FALSE, subset=NULL,
raw.index=FALSE, warn.ties=TRUE, ...)
queryAnnoy(X, query, k, get.index=TRUE, get.distance=TRUE, last=k,
BPPARAM=SerialParam(), precomputed=NULL, transposed=FALSE, subset=NULL,
raw.index=NA, warn.ties=NA, ...)
queryHnsw(X, query, k, get.index=TRUE, get.distance=TRUE, last=k,
BPPARAM=SerialParam(), precomputed=NULL, transposed=FALSE, subset=NULL,
raw.index=NA, warn.ties=NA, ...)
|
X |
A numeric matrix where rows correspond to data points and columns correspond to variables (i.e., dimensions). |
query |
A numeric matrix of query points, containing different data points in the rows but the same number and ordering of dimensions in the columns. |
k |
A positive integer scalar specifying the number of nearest neighbors to retrieve. |
get.index |
A logical scalar indicating whether the indices of the nearest neighbors should be recorded. |
get.distance |
A logical scalar indicating whether distances to the nearest neighbors should be recorded. |
last |
An integer scalar specifying the number of furthest neighbors for which statistics should be returned. |
BPPARAM |
A BiocParallelParam object indicating how the search should be parallelized. |
precomputed |
A BiocNeighborIndex object of the appropriate class, generated from |
transposed |
A logical scalar indicating whether the |
subset |
A vector indicating the rows of |
raw.index |
A logial scalar indicating whether raw column indices should be returned,
see |
warn.ties |
Logical scalar indicating whether a warning should be raised if any of the |
... |
Further arguments to pass to the respective |
All of these functions identify points in X
that are the k
nearest neighbors of each point in query
.
queryAnnoy
performs an approximate search, while queryExhaustive
, queryKmknn
and queryVptree
are exact.
This requires both X
and query
to have the same number of dimensions.
Moreover, the upper bound for k
is set at the number of points in X
.
By default, nearest neighbors are identified for all data points within query
.
If subset
is specified, nearest neighbors are only detected for the query points in the subset.
This yields the same result as (but is more efficient than) subsetting the output matrices after running queryKmknn
on the full query
.
If transposed=TRUE
, this function assumes that query
is already transposed, which saves a bit of time by avoiding an unnecessary transposition.
Turning off get.index
or get.distance
may also provide a slight speed boost when these returned values are not of interest.
Using BPPARAM
will also split the search by query points across multiple processes.
Setting last
will return indices and/or distances for the k - last + 1
-th closest neighbor to the k
-th neighbor.
This can be used to improve memory efficiency, e.g., by only returning statistics for the k
-th nearest neighbor by setting last=1
.
Note that this is entirely orthogonal to subset
.
If multiple queries are to be performed to the same X
, it may be beneficial to build the index from X
(e.g., with buildKmknn
).
The resulting BiocNeighborIndex object can be supplied as precomputed
to multiple function calls, avoiding the need to repeat index construction in each call.
Note that when precomputed
is supplied, the value of X
is ignored.
For exact methods, see comments in ?"BiocNeighbors-ties"
regarding the warnings when tied distances are observed.
For approximate methods, see comments in buildAnnoy
and buildHnsw
about the (lack of) randomness in the search results.
A list is returned containing:
index
, if get.index=TRUE
.
This is an integer matrix where each row corresponds to a point (denoted here as i) in query
.
The row for i contains the row indices of X
that are the nearest neighbors to point i, sorted by increasing distance from i.
distance
, if get.distance=TRUE
.
This is a numeric matrix where each row corresponds to a point (as above) and contains the sorted distances of the neighbors from i.
Each matrix contains last
columns.
If subset
is not NULL
, each row of the above matrices refers to a point in the subset, in the same order as supplied in subset
.
See ?"BiocNeighbors-raw-index"
for an explanation of the output when raw.index=TRUE
for the functions that support it.
Aaron Lun
buildExhaustive
,
buildKmknn
,
buildVptree
,
or buildAnnoy
to build an index ahead of time.
See ?"BiocNeighbors-algorithms"
for an overview of the available algorithms.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | Y <- matrix(rnorm(100000), ncol=20)
Z <- matrix(rnorm(20000), ncol=20)
out <- queryExhaustive(Y, query=Z, k=5)
head(out$index)
head(out$distance)
out1 <- queryKmknn(Y, query=Z, k=5)
head(out1$index)
head(out1$distance)
out2 <- queryVptree(Y, query=Z, k=5)
head(out2$index)
head(out2$distance)
out3 <- queryAnnoy(Y, query=Z, k=5)
head(out3$index)
head(out3$distance)
out4 <- queryHnsw(Y, query=Z, k=5)
head(out4$index)
head(out4$distance)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.