## block with some startup/background objects functions library(umap) iris.colors <- c("#ff7f00", "#e377c2", "#17becf") plot.iris <- function(x, labels, main="A UMAP visualization of the Iris dataset", pad=0.02, cex=0.65, pch=19, cex.main=1, cex.legend=1) { layout <- x$layout par(mar=c(0.2,0.7,1.2,0.7), ps=10) xylim <- range(layout) xylim <- xylim + ((xylim[2]-xylim[1])*pad)*c(-0.5, 0.5) plot(xylim, xylim, type="n", axes=F, frame=F) xylim <- par()$usr rect(xylim[1], xylim[1], xylim[2], xylim[2], border="#aaaaaa", lwd=0.2) points(layout[,1], layout[,2], col=iris.colors[as.integer(labels)], cex=cex, pch=pch) mtext(side=3, main, cex=cex.main) labels.u <- unique(labels) legend("topright", legend=as.character(labels.u), col=iris.colors[as.integer(labels.u)], bty="n", pch=pch, cex=cex.legend) } set.seed(123456)
R package umap
provides an interface to uniform manifold approximation and
projection (UMAP) algorithms. There are now several implementations,
including versions of python package umap-learn
. This vignette explains
some aspects of interfacing with the python package.
(For general information on usage of package umap
, see the
introductory vignette.)
As prep, let's load the package and prepare a small dataset.
library(umap) iris.data <- iris[, grep("Sepal|Petal", colnames(iris))]
The basic command to perform dimensional reduction is umap
. By default,
this function uses an implementation written in R. To use the python package
umap-learn
instead, that package and its dependencies must be installed
separately (see python package index
or the package source). The R package
reticulate
is also required (use install.packages('reticulate') and
library('reticulate')
).
After completing installations, the python implementation is activated by
specifying method="umap-learn"
.
library(reticulate) iris.umap_learn <- umap(iris.data, method="umap-learn")
(This command is not actually executed in the vignette because umap-learn
may not be available on the rendering system. If umap-learn
is available,
the command should execute quietly and create a new object iris.umap_learn
that contains an embedding.)
As covered in the introductory vignette, tuning parameters can be set via a
configuration object and via explicit arguments in the umap
function call.
The default configuration is accessible as object umap.defaults
.
umap.defaults
umap.defaults
Note the entry umap_learn_args
toward the end. It is set to NA
by
default. This indicates that appropriate arguments will be selected
automatically and passed to umap-learn.
After executing dimensional reduction, the output object contains a copy of the configuration with the values actually used to produce the output. We can examine the effective configuration that was used for our embedding.
iris.umap_learn$config
(Again, this command is not executed in the vignette because umap-learn
may not be available on the rendering system. When umap-learn
is available,
this should produce a configuration printout.)
The entry for umap_learn_args
should contain a vector of all the arguments
passed from the configuration object to the python package. An entry in the
configuration should also reveal the version of the python package used to
perform the calculation.
A configuration object can contain many components, but not all may be used
in a calculation. To verify that a setting is actually passed to
umap-learn
, ensure that it appears in umap_learn_args
in the output.
As an example, consider setting foo
and n_epochs
during the function call.
## (not evaluated in vignette) iris.foo <- umap(iris.data, method="umap-learn", foo=4, n_epochs=100) iris.foo$config
Inspecting the output configuration will reveal that both foo
and
n_epochs
are recorded (in the latter case, the default value is replaced
by the new value). However, foo
should not appear in umap_learn_args
.
This means that foo
was not actually passed on to umap-learn
.
Various version of umap-learn
take different parameters as input. The R
package is coded to work with umap-learn
versions 0.2, 0.3, 0.4, and 0.5.
It will adjust arguments automatically to suit those versions.
Note, however, that some arguments that are acceptable in new versions of
umap-learn are not set in the default configuration object. To use those
features (see python package documentation), set the appropriate arguments
manually, either by preparing a custom configuration object or by specifying
the arguments during the umap
function call.
It is possible to set umap_learn_args
manually while calling umap
.
## (not evaluated in vignette) iris.custom <- umap(iris.data, method="umap-learn", umap_learn_args=c("n_neighbors", "n_epochs")) iris.custom$config
Here, only the two specified arguments have been passed on to the calculation.
Summary of R session:
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.