knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = FALSE
)

umap2 is a new function that works a lot like umap, but updates some defaults to make it easier to use, and to bring it a bit more in line with the Python UMAP implementation. The main differences are the following defaults:

These are not big changes so don't expect large differences in behavior, but I do strongly recommend installing (and loading) RcppHNSW and rnndescent. I'll use the MNIST digits for a comparison. Use the snedata package from github for this:

# install.packages("pak")
pak::pkg_install("jlmelville/snedata")

# or
# install.packages("devtools")
# devtools::install_github("jlmelville/snedata")
mnist <- snedata::download_mnist()

Now let's run umap and umap2 on the MNIST data using their defaults.

library(uwot)

set.seed(42)
mnist_umap <- umap(mnist)

Install RcppHNSW and rnndescent if you haven't already.

install.packages(c("RcppHNSW", "rnndescent"))

With these libraries installed umap2 will use RcppHNSW by default.

library(RcppHNSW)
library(rnndescent)

set.seed(42)
mnist_umap2 <- umap2(mnist)
#install.packages(c("ggplot2", "Polychrome"))
library(ggplot2)
library(Polychrome)

set.seed(42)
palette <- as.vector(Polychrome::createPalette(
  length(levels(mnist$Label)) + 2,
  seedcolors = c("#ffffff", "#000000"),
  range = c(10, 90)
)[-(1:2)])
ggplot(
    data.frame(mnist_umap, Digit = mnist$Label),
    aes(x = X1, y = X2, color = Digit)
) +
    geom_point(alpha = 0.1, size = 0.5) +
    scale_color_manual(values = palette) +
    theme_minimal() +
    labs(
        title = "MNIST with uwot::umap",
        x = "",
        y = "",
        color = "Digit"
    ) +
    theme(legend.position = "right") +
    guides(color = guide_legend(override.aes = list(size = 5, alpha = 1)))

umap on MNIST

ggplot(
  data.frame(mnist_umap2, Digit = mnist$Label),
  aes(x = X1, y = X2, color = Digit)
) +
  geom_point(alpha = 0.5, size = 1.0) +
  scale_color_manual(values = palette) +
  theme_minimal() +
  labs(
    title = "MNIST with uwot::umap2",
    x = "",
    y = "",
    color = "Digit"
  ) +
  theme(legend.position = "right") +
  guides(color = guide_legend(override.aes = list(size = 5, alpha = 1)))

umap2 on MNIST

The biggest difference is that the clusters are somewhat larger and closer together. This is due to the increase in min_dist. If you re-run with min_dist = 0.01 you will get a plot that is very similar to the umap plot.

set.seed(42)
mnist_umap2 <- umap2(mnist, min_dist = 0.01)

I will spare you the ggplot2 incantation and go straight to the image:

umap2 on MNIST

So there's not much difference in whether you use umap2 or umap. In general, RcppHNSW and rnndescent can find nearest neighbors at a given level of quality a bit faster than Annoy does and even if you deviate from the default settings, you probably have less to type with umap2 than umap.



jlmelville/uwot documentation built on April 25, 2024, 5:20 a.m.