umap: Dimensionality Reduction with UMAP In uwot: The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction

Description

Carry out dimensionality reduction of a dataset using the Uniform Manifold Approximation and Projection (UMAP) method (McInnes & Healy, 2018). Some of the following help text is lifted verbatim from the Python reference implementation at https://github.com/lmcinnes/umap.

Usage

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 umap( X, n_neighbors = 15, n_components = 2, metric = "euclidean", n_epochs = NULL, learning_rate = 1, scale = FALSE, init = "spectral", init_sdev = NULL, spread = 1, min_dist = 0.01, set_op_mix_ratio = 1, local_connectivity = 1, bandwidth = 1, repulsion_strength = 1, negative_sample_rate = 5, a = NULL, b = NULL, nn_method = NULL, n_trees = 50, search_k = 2 * n_neighbors * n_trees, approx_pow = FALSE, y = NULL, target_n_neighbors = n_neighbors, target_metric = "euclidean", target_weight = 0.5, pca = NULL, pca_center = TRUE, pcg_rand = TRUE, fast_sgd = FALSE, ret_model = FALSE, ret_nn = FALSE, ret_extra = c(), n_threads = NULL, n_sgd_threads = 0, grain_size = 1, tmpdir = tempdir(), verbose = getOption("verbose", TRUE), batch = FALSE, opt_args = NULL, epoch_callback = NULL, pca_method = NULL )

Value

A matrix of optimized coordinates, or:

• if ret_model = TRUE (or ret_extra contains "model"), returns a list containing extra information that can be used to add new data to an existing embedding via umap_transform. In this case, the coordinates are available in the list item embedding. NOTE: The contents of the model list should not be considered stable or part of the public API, and are purposely left undocumented.

• if ret_nn = TRUE (or ret_extra contains "nn"), returns the nearest neighbor data as a list called nn. This contains one list for each metric calculated, itself containing a matrix idx with the integer ids of the neighbors; and a matrix dist with the distances. The nn list (or a sub-list) can be used as input to the nn_method parameter.

• if ret_extra contains "fgraph" returns the high dimensional fuzzy graph as a sparse matrix called fgraph, of type dgCMatrix-class.

The returned list contains the combined data from any combination of specifying ret_model, ret_nn and ret_extra.

References

Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (pp. 585-591). http://papers.nips.cc/paper/1961-laplacian-eigenmaps-and-spectral-techniques-for-embedding-and-clustering.pdf

Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://arxiv.org/abs/1412.6980

McInnes, L., & Healy, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction arXiv preprint arXiv:1802.03426. https://arxiv.org/abs/1802.03426

O’Neill, M. E. (2014). PCG: A family of simple fast space-efficient statistically good algorithms for random number generation (Report No. HMC-CS-2014-0905). Harvey Mudd College.

Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016, April). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th International Conference on World Wide Web (pp. 287-297). International World Wide Web Conferences Steering Committee. https://arxiv.org/abs/1602.00370

Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9 (2579-2605). https://www.jmlr.org/papers/v9/vandermaaten08a.html

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 iris30 <- iris[c(1:10, 51:60, 101:110), ] # Non-numeric columns are automatically removed so you can pass data frames # directly in a lot of cases without pre-processing iris_umap <- umap(iris30, n_neighbors = 5, learning_rate = 0.5, init = "random", n_epochs = 20) # Faster approximation to the gradient and return nearest neighbors iris_umap <- umap(iris30, n_neighbors = 5, approx_pow = TRUE, ret_nn = TRUE, n_epochs = 20) # Can specify min_dist and spread parameters to control separation and size # of clusters and reuse nearest neighbors for efficiency nn <- iris_umap\$nn iris_umap <- umap(iris30, n_neighbors = 5, min_dist = 1, spread = 5, nn_method = nn, n_epochs = 20) # Supervised dimension reduction using the 'Species' factor column iris_sumap <- umap(iris30, n_neighbors = 5, min_dist = 0.001, y = iris30\$Species, target_weight = 0.5, n_epochs = 20) # Calculate Petal and Sepal neighbors separately (uses intersection of the resulting sets): iris_umap <- umap(iris30, metric = list( "euclidean" = c("Sepal.Length", "Sepal.Width"), "euclidean" = c("Petal.Length", "Petal.Width") ))

uwot documentation built on Dec. 11, 2021, 9:58 a.m.