project: Project new data into an existing t-SNE embedding object.

View source: R/project.R

projectR Documentation

Project new data into an existing t-SNE embedding object.

Description

Project new data into an existing t-SNE embedding object.

Usage

project(
  x,
  new,
  old,
  perplexity = 5,
  initialization = c("median", "weighted", "random"),
  k = 25L,
  learning_rate = 0.1,
  early_exaggeration = 4,
  early_exaggeration_iter = 0L,
  exaggeration = 1.5,
  n_iter = 250L,
  initial_momentum = 0.5,
  final_momentum = 0.8,
  max_grad_norm = 0.25,
  tolerance = 1e-04
)

Arguments

x

t-SNE embedding created with fitsne.

new

New data to project into existing embedding

old

Data used to create the original embedding.

perplexity

Numeric scalar. Perplexity can be thought of as the continuous number of nearest neighbors, for which t-SNE will attempt to preserve distances. However, when projecting, we only consider neighbors in the existing embedding i.e. each data point is placed into the embedding, independently of other new data points.

initialization

Character scalar specifying the method used to compute the initial point positions to be used in the embedding space. Can be "median", "weighted" or "random". In all cases, "median" or "weighted" should be preferred.

k

Integer scalar specifying the number of nearest neighbors to consider when initially placing the point onto the embedding. This is different from "perplexity" because perplexity affects optimization while this only affects the initial point positions.

learning_rate

The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. Otherwise, a numeric scalar.

early_exaggeration

Numeric scalar; the exaggeration factor to use during the *early exaggeration* phase. Typical values range from 12 to 32.

early_exaggeration_iter

The number of iterations to run in the *early exaggeration* phase.

exaggeration

The exaggeration factor to use during the normal optimization phase. This can be used to form more densely packed clusters and is useful for large data sets.

n_iter

The number of iterations to run in the normal optimization regime.

initial_momentum

The momentum to use during the *early exaggeration* phase.

final_momentum

The momentum to use during the normal optimization phase.

max_grad_norm

Maximum gradient norm. If the norm exceeds this value, it will be clipped. When adding points into an existing embedding, and the new points overlap with the reference points, this may lead to large gradients. This can make points "shoot off" from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.

tolerance

Numeric scalar specifying the numeric tolerance used to ensure the affinities calculated on the old data match those of the original embedding.

Value

Numeric matrix of t-SNE co-ordinates resulting from embedding new into the t-SNE embedding x.

References

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Belkina, A.C., Ciccolella, C.O., Anno, R. et al. Nature Communications 10, 5415 (2019). doi: https://doi.org/10.1038/s41467-019-13055-y

Examples

 set.seed(42)
 m <- matrix(rnorm(2000), ncol=20)
 out_binding <- fitsne(m[-(1:2), ], random_state = 42L)
 new_points <- project(out_binding, new = m[1:2, ], old = m[-(1:2), ])
 plot(as.matrix(out_binding), col = "black", pch = 19,
     xlab = "t-SNE 1", ylab = "t-SNE 2")
 points(new_points, col = "red", pch = 19)

Alanocallaghan/snifter documentation built on Sept. 14, 2023, 9:25 p.m.