queryNeighbors | R Documentation |
Find all points in a reference dataset that lie within a threshold distance of each point in a query dataset.
queryNeighbors(
X,
query,
threshold,
get.index = TRUE,
get.distance = TRUE,
num.threads = 1,
subset = NULL,
transposed = FALSE,
...,
BNPARAM = NULL
)
X |
The reference dataset to be queried.
This should be a numeric matrix where rows correspond to reference points and columns correspond to variables (i.e., dimensions).
Alternatively, a prebuilt BiocNeighborIndex object from |
query |
A numeric matrix of query points, containing the same number of columns as |
threshold |
A positive numeric scalar specifying the maximum distance at which a point is considered a neighbor. Alternatively, a vector containing a different distance threshold for each query point. |
get.index |
A logical scalar indicating whether the indices of the neighbors should be recorded. |
get.distance |
A logical scalar indicating whether distances to the neighbors should be recorded. |
num.threads |
Integer scalar specifying the number of threads to use for the search. |
subset |
An integer, logical or character vector indicating the rows of |
transposed |
A logical scalar indicating whether |
... |
Further arguments to pass to |
BNPARAM |
A BiocNeighborParam object specifying how the index should be constructed.
If |
This function identifies all points in X
that within threshold
of each point in query
.
For Euclidean distances, this is equivalent to identifying all points in a hypersphere centered around the point of interest.
Not all implementations support this search mode, but we can use KmknnParam and VptreeParam.
If threshold
is a vector, each entry is assumed to specify a (possibly different) threshold for each point in query
.
If subset
is also specified, each entry is assumed to specify a threshold for each point in subset
.
An error will be raised if threshold
is a vector of incorrect length.
If multiple queries are to be performed to the same X
, it may be beneficial to build the index from X
with buildIndex
.
The resulting pointer object can be supplied as X
to multiple queryKNN
calls, avoiding the need to repeat index construction in each call.
A list is returned containing:
index
, if get.index=TRUE
.
This is a list of integer vectors where each entry corresponds to a point (denoted here as i
) in query
.
The vector for i
contains the set of row indices of all points in X
that lie within threshold
of point i
.
Neighbors for i
are sorted by increasing distance from i
.
distance
, if get.distance=TRUE
.
This is a list of numeric vectors where each entry corresponds to a point (as above) and contains the distances of the neighbors from i
.
Elements of each vector in distance
match to elements of the corresponding vector in index
.
If both get.index=FALSE
and get.distance=FALSE
, an integer vector is returned of length equal to the number of observations.
The i
-th entry contains the number of neighbors of i
within threshold
.
If subset
is not NULL
, each entry of the above vector/lists refers to a point in the subset, in the same order as supplied in subset
.
Aaron Lun
buildIndex
, to build an index ahead of time.
Y <- matrix(rnorm(100000), ncol=20)
Z <- matrix(rnorm(20000), ncol=20)
out <- queryNeighbors(Y, query=Z, threshold=3)
summary(lengths(out$index))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.