title: "Fast SGD settings" output: html_document: theme: cosmo toc: true toc_float: collapsed: false

This is part of the documentation for UWOT.

By default, uwot uses some conservative settings during the stochastic gradient descent. But there are some options that can speed things up:

If you need reproducibility (e.g. for scientific/publication purposes), then using multiple optimization threads is a deal-breaker. But for visualization purposes or if reproducibility isn't that important to you, then these settings can give a meaningful speed up of the performance of uwot, without hurting the quality of the layout.

To prove it, here are some images of embeddings with the fast settings on and off. On the left are images with the optimization settings left in their default (slower) values. I also use PCA to reduce the dimensionality to 100:

mnist_umap_slow <- umap(mnist, pca = 100, verbose = TRUE)

On the right are the embeddings from using fast_sgd = TRUE:

mnist_umap_fast <- umap(mnist, pca = 100, fast_sgd = TRUE, verbose = TRUE)

Six threads were used in the stochastic gradient descent.

For details on the datasets, see the examples page. The timings are given in the title, in minutes and seconds. Note that this is for entire run, not just the optimization phase, i.e. it includes the PCA dimensionality reduction and nearest neighbor search, which is usually the slowest part of the run and which is not affected by fast_sgd.


| | | :----------------------------:|:--------------------------: mnist slow|mnist fast


| | | :----------------------------:|:--------------------------: fashion slow|fashion fast


| | | :----------------------------:|:--------------------------: kuzushiji slow|kuzushiji fast


| | | :----------------------------:|:--------------------------: norb slow|norb fast

The distribution of clusters here does a look a little bit different, but I think it's within the variation that one would expect from the stochastic nature of the optimization. To bolster my point, here are two further runs of UMAP on the NORB dataset with fast_sgd = FALSE and different seeds:

| | | :----------------------------:|:--------------------------: norb 42|norb 5446

I think if you didn't know one of these four images was generated with the fast settiings, you'd be hard pressed to pick it out of the line up. The variation between images is more likely due to the default n_neighbors being too low to capture the global structure. Here's some plots with n_neighbors = 150, and n_epochs = 500 to account for the increased number of edges that need sampling:

| | | :----------------------------:|:--------------------------: norb 150 slow|norb 150 fast

Apart from one of the blue loops being open in the fast_sgd = TRUE result, which is probably fixable with a longer optimization, the global arrangement of the two plots is pretty similar.


| | | :----------------------------:|:--------------------------: tasic2018 slow|tasic2018 fast


| | | :----------------------------:|:--------------------------: macosko2015 slow|macosko2015 fast


From these results, I'd say that the fast_sgd = TRUE settings give results which are effectively indistinguishable from the slower settings, so if you want to save a bit of time, there seems no harm in using them. The actual time savings you'll see depend on how long the nearest neighbor search takes, and any initial PCA you carry out on the input (as we do in all these examples) can take a fair amount of the run time. For example on NORB, the PCA takes up 5 minutes of the six-and-a-half minute total run time, so there's not a lot of time to be saved. But for MNIST and Fashion, you can effectively halve the run time (with six threads, anyway).

If reproducibility is important to you, then using multiple threads in the optimization is out of the question, although that's what gives the biggest speed increase. However, you could still consider setting approx_pow = TRUE, pcg_random = FALSE. For MNIST-sized datasets (mnist, fashion, kuzushiji) I saw a reasonable speed up of around 25%. For datasets where the PCA and nearest neighbor search dominates, the gains are smaller: a 10-15% speedup for tasic2018 and macosko2015, and only 5% for norb. If you need to set n_epochs higher, then these time savings will increase.

jlmelville/uwot documentation built on July 8, 2023, 7:05 a.m.