hnsw_knn | R Documentation |
A k-nearest neighbor algorithm using the hnswlib library (https://github.com/nmslib/hnswlib).
hnsw_knn(
X,
k = 10,
distance = "euclidean",
M = 16,
ef_construction = 200,
ef = 10,
verbose = FALSE,
progress = "bar",
n_threads = 0,
grain_size = 1,
byrow = TRUE
)
X |
A numeric matrix of |
k |
Number of neighbors to return. |
distance |
Type of distance to calculate. One of:
|
M |
Controls the number of bi-directional links created for each element
during index construction. Higher values lead to better results at the
expense of memory consumption. Typical values are |
ef_construction |
Size of the dynamic list used during construction. A larger value means a better quality index, but increases build time. Should be an integer value between 1 and the size of the dataset. |
ef |
Size of the dynamic list used during search. Higher values lead
to improved recall at the expense of longer search time. Can take values
between |
verbose |
If |
progress |
defunct and has no effect. |
n_threads |
Maximum number of threads to use. The exact number is
determined by |
grain_size |
Minimum amount of work to do (rows in |
byrow |
if |
a list containing:
idx
a matrix containing the nearest neighbor indices.
dist
a matrix containing the nearest neighbor distances.
The dimensions of the matrices respect the storage (row or column-based) of
X
as indicated by the byrow
parameter. If byrow = TRUE
(the default)
each row of idx
and dist
contain the neighbor information for the item
passed in the equivalent row of X
, i.e. the dimensions are n x k
where
n
is the number of items in X
. If byrow = FALSE
, then each column of
idx
and dist
contain the neighbor information for the item passed in
the equivalent column of X
, i.e. the dimensions are k x n
.
Every item in the dataset is considered to be a neighbor of itself, so the
first neighbor of item i
should always be i
itself. If that isn't the
case, then any of M
, ef_construction
or ef
may need increasing.
Some details on the parameters used for index construction and search, based on https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md:
M
Controls the number of bi-directional links created for each
element during index construction. Higher values lead to better results at
the expense of memory consumption, which is around M * 8-10
bytes
per bytes per stored element. High intrinsic dimensionalities will require
higher values of M
. A range of 2 - 100
is typical, but
12 - 48
is ok for most use cases.
ef_construction
Size of the dynamic list used during
construction. A larger value means a better quality index, but increases
build time. Should be an integer value between 1 and the size of the
dataset. A typical range is 100 - 2000
. Beyond a certain point,
increasing ef_construction
has no effect. A sufficient value of
ef_construction
can be determined by searching with ef = ef_construction
, and ensuring that the recall is at least 0.9.
ef
Size of the dynamic list used during index search. Can
differ from ef_construction
and be any value between k
(the
number of neighbors sought) and the number of elements in the index being
searched.
Malkov, Y. A., & Yashunin, D. A. (2016). Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. arXiv preprint arXiv:1603.09320.
iris_nn_data <- hnsw_knn(as.matrix(iris[, -5]), k = 10)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.