December 27 2024: Updated to use umap2
and with better images.
The default initialization for UMAP uses the eigenvectors of a normalized graph Laplacian of the sparse affinity matrix. This normally works well except when:
uwot
, there is a risk that the
RSpectra can have trouble
converging under these circumstances.An alternative to using the spectral initialization is to use PCA, although
depending on the scaling of your input variables, this could lead to a quite
spread-out initial set of coordinates, which can be hard to optimize. For the
related technique t-SNE,
Kobak and Berens recommend
scaling the PCA results so that the standard deviation for each dimension is
1e-4
. You can get that behavior in uwot
by using the umap
function with init = "spca"
or umap2
with init = "spca", init_sdev = 1e-4
.
umap2
Based on some years of experience, the umap2
function has some slightly
different defaults which I recommend over umap
. For more on the umap2
function, see the
umap2 article. But
I recommend using init = "pca"
. At some point after I wrote uwot
(or I just
never noticed), the Python UMAP package added an extra scaling of coordinates so
that they are contained in a bounding box of side length 10. This is now the
default with umap2
and seems to work pretty well.
All the results shown on this page now use umap2
.
umap
If you are still using umap
, then the init = "spca"
option there works ok,
but the standard deviation of 1e-4
may be too small for some datasets. If you
see results where you are suspicious that clusters are being broken up, I
recommend trying with value of init_sdev
of 1e-2
, 1
or 2.5
. If you want
to use the Python UMAP-style scaling with umap
, you can get it by setting
init_sdev = "range"
, and is probably the best choice.
PCA is a solid choice for initialization when spectral embedding fails. But using PCA ignores the information in the affinity matrix that might be useful for initialization: for example, when using supervised UMAP or mixing different types of distance metrics. And conceptually, using the affinity matrix has a lot of appeal. Dzwinel and co-workers describe a very fast multi-dimensional scaling-like method called ivhd which generates a connected graph, by only knowing only a few neighbor and non-neighbor points and pretending the target distances are 0 and 1, respectively. Similar theoretical support for this is given by Linderman and co-workers. On the basis of this work, I have implemented an "approximate global" spectral method of initialization: which is a very minor modification of the standard spectral initialization:
n_neg_samples
for each positive edge in the graph. This is consistent with the
ratio of neighbor to non-neighbor edge sampling during optimization and, if
using a default value of n_neg_samples
, is a similar order of magnitude as the
ratio of neighbor to non-neighbor entries in the graph that is used with ivhd.You can get this behavior in uwot with init = "agspectral"
.
Another option for a connected graph is to use a variation on the normalization
used in the graph Laplacian to get something close to
Laplacian Eigenmap
with init = "laplacian"
. In
Luxburg's spectral clustering tutorial (PDF)
this form of normalization is recommended over that used in the
init = "spectral"
normalization. So we'll try it here and see if it makes any
difference.
If all else fails, you can use random initialization. In the spirit of
generosity, uwot
offers two types of random initialization. The first is that
used by UMAP, which uses a random uniform distribution between -10 and +10 along
each axis. You can get this with init = "rand"
. Alternatively, there is the
method favoured by LargeVis (and used by default with lvish
), which is to use
a Gaussian distribution with standard deviation 1e-4. This is also the method
used by t-SNE. You can get this by setting init = "lvrand"
.
To summarize your options:
# Default initialization: use spectral initialization if possible # falling back to scaled PCA if multiple components are found in the graph embedding <- umap2(data) # same as the above embedding <- umap2(data, init = "spectral") # use (scaled) PCA: defaults are different between umap and umap2 embedding <- umap2(data, init = "pca") # If you want to use `umap` then you can use the following: # umap with default scaled PCA embedding <- umap(data, init = "spca") # equivalent to: embedding <- umap(data, init = "pca", init_sdev = 1e-4) # but this might be a better choice of settings embedding <- umap(data, init = "pca", init_sdev = "range") # use an "approximate global" spectral initialization that should be applicable # even under conditions where "spectral" fails and has to use spca embedding <- umap2(data, init = "agspectral") # use a Laplacian Eigenmap style initialization, which will also fall back to # spca if it has to embedding <- umap2(data, init = "laplacian") # use random initialization, UMAP style embedding <- umap2(data, init = "rand") # use random initialization, t-SNE/LargeVis style embedding <- umap2(data, init = "lvrand")
Below, we'll explore the effect of these settings. For more details on the datasets, see the examples page.
Apart from changing init
, mainly default settings where used. To save a bit
of time, I used pre-calculated exact nearest neighbors, but I don't expect that
to have a major effect on the results. Some typical commands using the iris
dataset would be:
set.seed(1337) # Using the `umap` function with spca initialization iris_umap <- umap(iris, init = "spca") # Using the `umap2` with agspectral with slightly different (and hopefully # better) defaults iris_umap <- umap2(iris, init = "agspectral")
For details see the examples article.
Below are some examples of using either the PCA initialization or a spectral
approach. On the first row, the left-hand image is the result of using
init = "pca"
, and the right-hand image is init = "agspectral"
. On the
second row, the left-hand image is for init = "spectral"
. and the right-hand
image is init = "laplacian"
. Some datasets don't have the second row of
images, because with the settings used here, they generate more than one
component in the graph, and therefore both init = "spectral"
and
init = "laplacian"
would fall back to similar results as seen with
init = "pca"
.
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
| pca | agspectral |
:----------------------------:|:--------------------------:
|
| spectral | laplacian |
:----------------------------:|:--------------------------:
|
There are really only minor differences between the PCA results and the spectral results in most cases.
The big difference occurs with isoswiss
. This shows the Achilles heel of PCA
for initialization: it does not have any respect for manifolds. Where spectral
initialization is able to "unfold" the structure, PCA leaves it folded up and
the projection onto 2D leaves it broken up into three pieces.
In most (but not all) cases, the agspectral
results are very similar to the
spectral
results. Again isoswiss
shows the limitations of the method. Like
with PCA, the manifold has not been initialized properly and some unpleasant
breakage occurs. There are certainly worse ways to initialize this dataset
though, as we shall see below. I don't think the results here are a fundamental
limitation of the agspectral
approach, but the balance of the random vs
near neighbor affinities may not be ideal for this dataset.
The laplacian
results also strongly resemble the spectral initialization, and
where they differ it's not obvious to me if one is strictly better than the
other, so in terms of initialization for UMAP, there doesn't seem to be a strong
advantage for using one type of graph Laplacian over the other.
rand
Below are two results each from using random initialization, so you can get a sense for how much variation you can expect from getting lucky or not with the initialization.
# left-hand plots used this seed set.seed(1337) # right-hand plots used this seed # set.seed(42) iris_umap <- umap(iris, init = "rand")
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
Results are not terrible for the smaller datasets, but for anything
mnist
-sized, we start to see broken-up clusters. And oof, isoswiss
is a
disaster.
lvrand
And here is the result of using random initialization in the t-SNE/LargeVis style, which use a Gaussian distribution over a much smaller initial range of coordinate values. Will it make much difference?
# left-hand plots used this seed set.seed(1337) # right-hand plots used this seed # set.seed(42) iris_umap <- umap(iris, init = "lvrand")
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
| seed = 1337 | seed = 42 |
:----------------------------:|:--------------------------:
|
No, it doesn't make a huge difference.
The default spectral initialization does a good job. If for some reason you
can't use it, for example due to disconnections, then init = "pca"
is a good
choice. The one exception is with data where you think you have a manifold
structure. In that case, you would hope that you have a connected graph, in
which case you won't need to use PCA.
I would love to say that agspectral
can help you out if you are having
difficulty with a slow spectral initialization, and in general it seems useful,
but it may not work well with manifold data. The laplacian
initialization
does not provide a meaningful difference to the spectral initialization based
on the results shown here.
Random initialization should be used only as a last resort. Increasing
n_neighbors
can sometimes help a bit, but you may need to set it to a very
high value (e.g. 150) and that may introduce long-distance interactions that
doesn't represent your data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.