title: "Fast SGD settings" output: html_document: theme: cosmo toc: true toc_float: collapsed: false

This is part of the documentation for UWOT.

By default, `uwot`

uses some conservative settings during the stochastic
gradient descent. But there are some options that can speed things up:

`n_sgd_threads = "auto"`

(or any value greater than one): this is the biggest thing you can do to speed up optimization. The optimization is now spread across multiple threads. The`"auto"`

setting sets the value to that used in the other multi-threaded part of the code, which is controlled by`n_threads`

. The downside to using multiple threads is that you can no longer guarantee the reproducibility of the results. Normally, you would expect the same results if you called`set.seed`

before each run. This will not be the case if set`n_sgd_threads`

.`pcg_rand = FALSE`

: the default random number generator used in the optimization is taken from the PCG family. This is quite fast and has good statistical properties. An alternative PRNG is an implementation of the Tausworthe "taus88" generator, which is used in the Python implementation of UMAP and which I have translated from Python into C++. By setting`pcg_rand = FALSE`

, you will get the taus88 generator. This is faster than PCG, but bear in mind that PCG was designed by computer science professor Melissa O'Neill, and the taus88 implementation was translated into C++ by me (I am not a computer science professor), so if you have any concerns about the possibility of artifacts from bad random number generation, stick with PCG. Nonetheless, it probably gives results similar to that in the Python UMAP implementation most of the time.`approx_pow = TRUE`

: the UMAP gradient requires some power calculations, which are quite slow compared to other arithmetic operations. An approximation to pow by Martin Ankerl can give a noticeable speed up.`fast_sgd = TRUE`

: this just sets all three of the above options at once, to make life a bit easier.

If you need reproducibility (e.g. for scientific/publication purposes), then using
multiple optimization threads is a deal-breaker. But for visualization purposes
or if reproducibility isn't that important to you, then these settings can
give a meaningful speed up of the performance of `uwot`

, without hurting the
quality of the layout.

To prove it, here are some images of embeddings with the fast settings on and off. On the left are images with the optimization settings left in their default (slower) values. I also use PCA to reduce the dimensionality to 100:

```
set.seed(1337)
mnist_umap_slow <- umap(mnist, pca = 100, verbose = TRUE)
```

On the right are the embeddings from using `fast_sgd = TRUE`

:

```
set.seed(1337)
mnist_umap_fast <- umap(mnist, pca = 100, fast_sgd = TRUE, verbose = TRUE)
```

Six threads were used in the stochastic gradient descent.

For details on the datasets, see the
examples page. The
timings are given in the title, in minutes and seconds. Note that this is for
entire run, not just the optimization phase, i.e. it includes the PCA
dimensionality reduction and nearest neighbor search, which is usually the
slowest part of the run and which is not affected by `fast_sgd`

.

| | | :----------------------------:|:--------------------------: |

| | | :----------------------------:|:--------------------------: |

| | | :----------------------------:|:--------------------------: |

| | | :----------------------------:|:--------------------------: |

The distribution of clusters here does a look a little bit different, but I
think it's within the variation that one would expect from the stochastic nature
of the optimization. To bolster my point, here are two further runs of UMAP
on the NORB dataset with `fast_sgd = FALSE`

and different seeds:

| | | :----------------------------:|:--------------------------: |

I think if you didn't know one of these four images was generated with the fast
settiings, you'd be hard pressed to pick it out of the line up. The variation
between images is more likely due to the default `n_neighbors`

being too low to
capture the global structure. Here's some plots with `n_neighbors = 150`

, and
`n_epochs = 500`

to account for the increased number of edges that need
sampling:

| | | :----------------------------:|:--------------------------: |

Apart from one of the blue loops being open in the `fast_sgd = TRUE`

result,
which is probably fixable with a longer optimization, the global arrangement
of the two plots is pretty similar.

| | | :----------------------------:|:--------------------------: |

| | | :----------------------------:|:--------------------------: |

From these results, I'd say that the `fast_sgd = TRUE`

settings give results
which are effectively indistinguishable from the slower settings, so if you want
to save a bit of time, there seems no harm in using them. The actual time
savings you'll see depend on how long the nearest neighbor search takes, and any
initial PCA you carry out on the input (as we do in all these examples) can take
a fair amount of the run time. For example on NORB, the PCA takes up 5 minutes
of the six-and-a-half minute total run time, so there's not a lot of time to be
saved. But for MNIST and Fashion, you can effectively halve the run time (with
six threads, anyway).

If reproducibility is important to you, then using multiple threads in the
optimization is out of the question, although that's what gives the
biggest speed increase. However, you could still consider setting
`approx_pow = TRUE, pcg_random = FALSE`

. For MNIST-sized datasets (`mnist`

,
`fashion`

, `kuzushiji`

) I saw a reasonable speed up of around 25%. For datasets
where the PCA and nearest neighbor search dominates, the gains are smaller:
a 10-15% speedup for `tasic2018`

and `macosko2015`

, and only 5% for `norb`

.
If you need to set `n_epochs`

higher, then these time savings will increase.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.