umap_transform | R Documentation |
Carry out an embedding of new data using an existing embedding. Requires
using the result of calling umap
or tumap
with
ret_model = TRUE
.
umap_transform(
X = NULL,
model = NULL,
nn_method = NULL,
init_weighted = TRUE,
search_k = NULL,
tmpdir = tempdir(),
n_epochs = NULL,
n_threads = NULL,
n_sgd_threads = 0,
grain_size = 1,
verbose = FALSE,
init = "weighted",
batch = NULL,
learning_rate = NULL,
opt_args = NULL,
epoch_callback = NULL,
ret_extra = NULL,
seed = NULL
)
X |
The new data to be transformed, either a matrix of data frame. Must
have the same columns in the same order as the input data used to generate
the |
model |
Data associated with an existing embedding. |
nn_method |
Optional pre-calculated nearest neighbor data. There are two supported formats. The first is a list consisting of two elements:
The second supported format is a sparse distance matrix of type
|
init_weighted |
If |
search_k |
Number of nodes to search during the neighbor retrieval. The
larger k, the more the accurate results, but the longer the search takes.
Default is the value used in building the |
tmpdir |
Temporary directory to store nearest neighbor indexes during
nearest neighbor search. Default is |
n_epochs |
Number of epochs to use during the optimization of the
embedded coordinates. A value between |
n_threads |
Number of threads to use, (except during stochastic gradient descent). Default is half the number of concurrent threads supported by the system. |
n_sgd_threads |
Number of threads to use during stochastic gradient
descent. If set to > 1, then be aware that if |
grain_size |
Minimum batch size for multithreading. If the number of
items to process in a thread falls below this number, then no threads will
be used. Used in conjunction with |
verbose |
If |
init |
how to initialize the transformed coordinates. One of:
This parameter should be used in preference to |
batch |
If |
learning_rate |
Initial learning rate used in optimization of the
coordinates. This overrides the value associated with the |
opt_args |
A list of optimizer parameters, used when
If |
epoch_callback |
A function which will be invoked at the end of every
epoch. Its signature should be:
|
ret_extra |
A vector indicating what extra data to return. May contain any combination of the following strings:
|
seed |
Integer seed to use to initialize the random number generator
state. Combined with |
Note that some settings are incompatible with the production of a UMAP model
via umap
: external neighbor data (passed via a list to the
argument of the nn_method
parameter), and factor columns that were
included in the UMAP calculation via the metric
parameter. In the
latter case, the model produced is based only on the numeric data.
A transformation is possible, but factor columns in the new data are ignored.
A matrix of coordinates for X
transformed into the space
of the model
, or if ret_extra
is specified, a list
containing:
embedding
the matrix of optimized coordinates.
if ret_extra
contains "fgraph"
, an item of the same
name containing the high-dimensional fuzzy graph as a sparse matrix, of
type dgCMatrix-class.
if ret_extra
contains "sigma"
, returns a vector of
the smooth knn distance normalization terms for each observation as
"sigma"
and a vector "rho"
containing the largest
distance to the locally connected neighbors of each observation.
if ret_extra
contains "localr"
, an item of the same
name containing a vector of the estimated local radii, the sum of
"sigma"
and "rho"
.
if ret_extra
contains "nn"
, an item of the same name
containing the nearest neighbors of each item in X
(with respect
to the items that created the model
).
iris_train <- iris[1:100, ]
iris_test <- iris[101:150, ]
# You must set ret_model = TRUE to return extra data needed
iris_train_umap <- umap(iris_train, ret_model = TRUE)
iris_test_umap <- umap_transform(iris_test, iris_train_umap)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.