December 31 2018 Updated timings, keeping better track of versions numbers.
To get a feel for the performance of uwot
, here are some timings for
processing the MNIST dataset, compared with some other methods. I wouldn't take
them very seriously, but they show that uwot
is competitive with other
methods.
|Package |Version|Arguments|Time|
|--------|-------|---------|----|
|Rtsne|0.15|partial_pca = TRUE
|14m 13s|
|openTSNE (Python)|0.3.0-py37h830ac7b_1000|n_jobs=4
| 6m 4s|
|openTSNE (Python)|0.3.0-py37h830ac7b_1000|n_jobs=4, negative_gradient_method="bh"
| 17m 56s|
|FIt-SNE (C++)|1.0.0|nthreads = 4
|2m 43s|
|FIt-SNE (C++)|1.0.0|nthreads = 4
+ PCA to 50D|1m 11s|
|LargeVis (C++)|feb8121|-threads 4
|12m 43s|
|largeVis (R package)|e51871e|save_neighbors = FALSE, save_edges = FALSE, threads = 4
|33m 58s|
|uwot::lvish
|0.0.0.9009|n_threads = 4, n_sgd_threads = 4
| 5m 52s|
|UMAP (Python)|0.3.7-py37_1000||1m 25s|
|umap (R package)|09f6020|method = "naive"
| 9m 14s|
|uwot|0.0.0.9009|n_threads = 0
| 3m 11s|
|uwot|0.0.0.9009|n_threads = 4
| 2m 0s|
|uwot|0.0.0.9009|n_threads = 4, approx_pow = TRUE
| 1m 24s|
|uwot|0.0.0.9009|n_threads = 4, approx_pow = TRUE, n_sgd_threads = 4
| 1m 16s|
|uwot|0.0.0.9009|n_threads = 4, approx_pow = TRUE, pca = 50
| 48s|
Some notes on how these numbers were generated: I ran this on a Windows machine, using R 3.5.2 and Python 3.7.0. The official LargeVis implementation was built with Visual Studio 2017 Community Edition and may not be properly optimized (the VS solution is available in my fork).
For R packages, the MNIST data was downloaded via the
snedata package. For Python packages,
the sklearn.datasets.fetch_mldata('MNIST original')
was used. The LargeVis
source code contains a MNIST example with the data already present.
For FIt-SNE, I used the
provided Windows binary
via the R wrapper (and hence used the MNIST data from the snedata
package).
The reported time for second FIt-SNE entry in the table and includes the 13
seconds it takes to reduce the dimensionality to 50 via PCA, using
irlba (this is the same package and
dimension reduction used by Rtsne and the last reported time for uwot).
The default openTSNE uses the same FFT approach that FIt-SNE does, so I don't know why it's much slower, apart from the use of the numpy version of FFT rather than the FFTW library, but my understanding was that it shouldn't make much difference with a dataset the size of MNIST. Perhaps this is a Windows thing.
For uwot
, the bottleneck with typical settings is the nearest neighbor search,
which is currently provided by Annoy, whereas the Python implementation uses
pynndescent, a nearest neighbor
descent approach.
On the optimization side of things, uwot
defaults are conservative. Using
approx_pow = TRUE
uses the fastPrecisePow
approximation to the pow
function suggested by
Martin Ankerl.
For what I think seem like typical values of b
(between 0.7
and 0.9
) and
the squared distance (0
-1000
), I found the maximum relative error was about
0.06
. However, I haven't done much testing, beyond looking to see that results
from the
examples page
are not obviously worsened. Results in the table above with approx_pow = TRUE
do show a worthwhile improvement.
Using n_sgd_threads
with more than 1 thread will not give reproducible
results, but should not behave any worse than LargeVis in that regard, so for
many visualization needs, this is also worth trying.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.