A place for me to put the old less-structured updates that I post at the top of
the README.md
file for the package. When they aren't new any more, they will
get moved here. You should look at the
Changelog for fuller
details.
18 April 2024: Version 0.1.5 has been released to CRAN. This is an internal API change to support a forthcoming release of dqrng, so you should notice no changes on upgrading.
18 March 2024: Version 0.1.4 has been released to CRAN. This is a bug fix
release. Most notably, it fixes an issue where rnnd_build
would fail
with metric = "cosine"
.
08 December 2023: Version 0.1.3 has been released. This deals with some UBSAN and ASAN problems when missing data was present in the k-nearest neighbors graph.
27 November 2023: Frabjous day, rnndescent
is now on CRAN. The version
number has been bumped to 0.1.1.
24 Nov 2023 A new function rnnd_knn
has been added if you just want the
k-nearest neighbors graph for a dataset (i.e. no querying). I have also removed
some other functions and made some other breaking changes as I prepare for
CRAN submission. See the NEWS for details.
19 Nov 2023 The rnnd_build
function and rnnd_query
functions have been
added which simplify creating a knn/building an index and querying it,
respectively and should be the main way of using the package. The other
functions remain should you need more flexibility. Some functions have been
removed: the local scaling and the standalone distance functions. The latter
could return in a different package at some point.
13 November 2023. I have added most of the metrics that don't need extra
parameters for both sparse and non-sparse data, e.g. braycurtis
, dice
,
jaccard
, hellinger
etc. See the Missing Metrics
section at the end of this
README for those which are not implemented. There are a few breaking changes
(mainly around the hamming metric, see NEWS.md
for the exact details).
06 November 2023 Sparse data support has been added. You should be able to
use e.g. a dgCMatrix
with all the methods and currently supported metrics as
easily as a dense matrix.
30 October 2023 At last, a workable random partition forest implementation
has been added. This can be used standalone (e.g. rpf_knn
,
rpf_build
, rpt_knn_query
) or as initialization to nearest neighbor descent
(nnd_knn(init = "tree", ...)
). The forest itself can be serialized with
saveRDS
but you will pay a price for that convenience by having to pass it
back and forth from the R to C++ layer when querying. For now there is no
access to the underlying C++ class via R like in RcppHNSW and RcppAnnoy so it
may not be suitable for some use cases.
19 October 2023 Inevitably 0.0.11 is here because of a bug in 0.0.10 where nearest neighbor descent was not correctly flagging new/old neighbors which reduced performance (but not the actual result).
18 October 2023 A long-postponed major internal refactoring means I might be
able to make a bit of progress on this package. For now, the cosine
and
correlation
metrics have migrated to not preprocessing their data (these
versions are still available as cosine-preprocess
and correlation-preprocess
respectively). Also, I have exported the distance metrics as R functions (e.g.
cosine_distance
, euclidean_distance
).
18 September 2021 The "hamming"
metric now supports integer-valued (not just
binary) inputs, thanks to a contribution from
Vitalie Spinu. The older metric code path for
binary data only is supported via metric = "bhamming"
.
20 June 2021 A big step forward in usefulness with the addition of the
prepare_search_graph
function which creates and prunes an undirected search
graph from the neighbor graph for use with the (now re-named) graph_knn_query
function. The latter is now also capable of backtracking search and performs
fairly well.
4 October 2020 Added "correlation"
as a metric and the k_occur
function
to help diagnose potential
hubness in a dataset.
23 November 2019 Added merge_knn
and merge_knnl
for combining multiple
nn results.
15 November 2019 It is now possible to query a reference set of data to
produce the approximate knn graph relative to the references (i.e. none of the
queries will be selected as neighbors) via nnd_knn_query
(and related
brute_force
and random
variants).
27 October 2019 rnndescent
creeps towards usability. A multi-threaded
implementation (using
RcppParallel) has now been
added.
20 October 2019 The nnd_knn
function now has a init
parameter which can
be used to specify the initialization method. Currently "random"
and
"forest"
are supported. The latter uses a random partition forest to
initialize the search graph. This is much faster than the random initialization
but still not as fast as I would like.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.