run.umap | R Documentation |
Method to run a UMAP dimensionality reduction algorithm. A UMAP (uniform manifold approximation and projection) plot is a useful means to visualise data. As it is a dimensionality reduction algorithm, some data will be lost. It is good practice to validate any populations (namely through manual gating). For more information on parameter choices, see ?umap::umap.defaults. Uses the R package "umap" to calculate plots and "data.table" to handle data.
run.umap(dat, use.cols, umap.x.name = "UMAP_X",
umap.y.name = "UMAP_Y", umap.seed = 42, neighbours = 15,
n_components = 2, metric = "euclidean", n_epochs = 200,
input = "data", init = "spectral", min_dist = 0.1,
set_op_mix_ratio = 1, local_connectivity = 1, bandwidth = 1,
alpha = 1, gamma = 1, negative_sample_rate = 5, a_gradient = NA,
b_gradient = NA, spread = 1, transform_state = 42,
knn.repeats = 1, verbose = TRUE, umap_learn_args = NA)
dat |
NO DEFAULT. Input data.table or data.frame. |
use.cols |
NO DEFAULT. Vector of column names or numbers for clustering. |
umap.x.name |
DEFAULT = "UMAP_X". Character. Name of UMAP x-axis. |
umap.y.name |
DEFAULT = "UMAP_Y". Character. Name of UMAP y-axis. |
umap.seed |
DEFAULT = 42. Numeric. Seed value for reproducibility. |
neighbours |
DEFAULT = 15. Numeric. Number of nearest neighbours. |
n_components |
DEFAULT = 2. Numeric. Number of dimensions for output results. |
metric |
DEFAULT = "euclidean". Character or function. Determines how distances between data points are computed. Can also be "manhattan". |
n_epochs |
DEFAULT = 200. Numeric. Number of iterations performed during layout optimisation. |
input |
DEFAULT = "data". Character. Determines whether primary input argument is a data or distance matrix. Can also be "dist". |
init |
DEFAULT = "spectral". Character or matrix. Deafult "spectral" computes an initial embedding using eigenvectors of the connectivity graph matrix. Can also use "random" (creates an initial layout based on random coordinates). |
min_dist |
DEFAULT = 0.1. Numeric. Determines how close points appear in final layout. |
set_op_mix_ratio |
DEFAULT = 1. Numeric in range 0,1. Determines who the knn-graph is used to create a fuzzy simplicial graph. |
local_connectivity |
DEFAULT = 1. Numeric. Used during construction of fuzzy simplicial set. |
bandwidth |
DEFAULT = 1. Numeric. Used during construction of fuzzy simplicial set. |
alpha |
DEFAULT = 1. Numeric. Initial value of "learning rate" of layout optimisation. |
gamma |
DEFAULT = 1. Numeric. Together with alpha, it determines the learning rate of layout optimisation. |
negative_sample_rate |
DEFAULT = 5. Numeric. Determines how many non-neighbour points are used per point and per iteration during layout optimisation. |
a_gradient |
DEFAULT = NA. Numeric. Contributes to gradient calculations during layout optimisation. When left at NA, a suitable value will be estimated automatically. |
b_gradient |
DEFAULT = NA. Numeric. Contributes to gradient calculations during layout optimisation. When left at NA, a suitable value will be estimated automatically. |
spread |
DEFAULT = 1. Numeric. Used during automatic estimation of a_gradient/b_gradient parameters. |
transform_state |
DEFAULT = 42. Numeric. Seed for random number generation used during predict(). |
knn.repeats |
DEFAULT = 1. Numeric. Number of times to restart knn search. |
verbose |
DEFAULT = TRUE. Logical. Determines whether to show progress messages. |
umap_learn_args |
DEFAULT = NA. Vector. Vector of arguments to python package umap-learn. |
fast |
DEFAULT TRUE Whether to run uwot implementation of UMAP which is much faster. |
n_threads |
DEFAULT "auto". Numeric. Number of threads to use (except during stochastic gradient descent). For nearest neighbor search, only applies if |
n_sgd_threads |
DEFAULT "auto". Number of threads to use during stochastic gradient descent. If set to > 1, then be aware that if |
batch |
DEFAULT TRUE. If set to TRUE, then embedding coordinates are updated at the end of each epoch rather than during the epoch. In batch mode, results are reproducible with a fixed random seed even with n_sgd_threads > 1, at the cost of a slightly higher memory use. You may also have to modify learning_rate and increase n_epochs, so whether this provides a speed increase over the single-threaded optimization is likely to be dataset and hardware-dependent. |
Thomas Ashhurst, thomas.ashhurst@sydney.edu.au Felix Marsh-Wakefield, felix.marsh-wakefield@sydney.edu.au
# Run UMAP on a subset of the demonstration dataset
cell.dat <- do.subsample(Spectre::demo.clustered, 10000) # Subsample the demo dataset to 10000 cells
cell.dat$UMAP_X <- NULL
cell.dat$UMAP_Y <- NULL
cell.dat <- Spectre::run.umap(dat = cell.dat,
use.cols = c("NK11_asinh", "CD3_asinh",
"CD45_asinh", "Ly6G_asinh", "CD11b_asinh",
"B220_asinh", "CD8a_asinh", "Ly6C_asinh",
"CD4_asinh"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.