uwot
relies on the underlying compiler and C++ standard library on your
machine and this can result in differences in output even with the same input
data, arguments, packages and R version. If you require reproducibility between
machines, it is strongly suggested that you stick with the same OS and compiler
version on all of them (e.g. a fixed LTS of a Linux distro and gcc version).
Otherwise, the following can help:
tumap
method instead of umap
. This avoid the use of std::pow
in
gradient calculations. This also has the advantage of being faster to optimize.
However, this gives larger clusters in the output, and you don't have the
ability to control that with a
and b
(or spread
and min_dist
)
parameters.umap
, it's better to provide a
and b
directly with a fixed precision
rather than allowing them to be calculated via the spread
and min_dist
parameters. For default UMAP, use a = 1.8956, b = 0.8006
.approx_pow = TRUE
, which avoids the use of the std::pow
function.init = "spca"
rather than init = "spectral"
(although the latter is
the default and preferred method for UMAP initialization).n_sgd_threads
is set larger than 1
, then even if you use set.seed
,
results of the embeddings are not repeatable, This is because there is no
locking carried out on the underlying coordinate matrix, and work is partitioned
by edge not vertex and a given vertex may be processed by different threads. The
order in which reads and writes occur is of course at the whim of the thread
scheduler. This is the same behavior as LargeVis.In summary, your chances of reproducibility are increased by using:
mnist_umap <- umap(mnist, a = 1.8956, b = 0.8006, approx_pow = TRUE, init = "spca", batch = TRUE) # or mnist_tumap <- tumap(mnist, init = "spca", batch = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.