project: Project new data into an existing t-SNE embedding object.
In snifter: R wrapper for the python openTSNE library

Description Usage Arguments Value References Examples

View source: R/project.R

Project new data into an existing t-SNE embedding object.

project(
  x,
  new,
  old,
  perplexity = 5,
  initialization = c("median", "weighted", "random"),
  k = 25L,
  learning_rate = 0.1,
  early_exaggeration = 4,
  early_exaggeration_iter = 0L,
  exaggeration = 1.5,
  n_iter = 250L,
  initial_momentum = 0.5,
  final_momentum = 0.8,
  max_grad_norm = 0.25,
  tolerance = 1e-04
)

`x`	t-SNE embedding created with `fitsne`.
`new`	New data to project into existing embedding
`old`	Data used to create the original embedding.
`perplexity`	Numeric scalar. Perplexity can be thought of as the continuous number of nearest neighbors, for which t-SNE will attempt to preserve distances. However, when projecting, we only consider neighbors in the existing embedding i.e. each data point is placed into the embedding, independently of other new data points.
`initialization`	Character scalar specifying the method used to compute the initial point positions to be used in the embedding space. Can be "median", "weighted" or "random". In all cases, "median" or "weighted" should be preferred.
`k`	Integer scalar specifying the number of nearest neighbors to consider when initially placing the point onto the embedding. This is different from "perplexity" because perplexity affects optimization while this only affects the initial point positions.
`learning_rate`	The learning rate for t-SNE optimization. When `learning_rate="auto"` the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. Otherwise, a numeric scalar.
`early_exaggeration`	Numeric scalar; the exaggeration factor to use during the early exaggeration phase. Typical values range from 12 to 32.
`early_exaggeration_iter`	The number of iterations to run in the early exaggeration phase.
`exaggeration`	The exaggeration factor to use during the normal optimization phase. This can be used to form more densely packed clusters and is useful for large data sets.
`n_iter`	The number of iterations to run in the normal optimization regime.
`initial_momentum`	The momentum to use during the early exaggeration phase.
`final_momentum`	The momentum to use during the normal optimization phase.
`max_grad_norm`	Maximum gradient norm. If the norm exceeds this value, it will be clipped. When adding points into an existing embedding, and the new points overlap with the reference points, this may lead to large gradients. This can make points "shoot off" from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
`tolerance`	Numeric scalar specifying the numeric tolerance used to ensure the affinities calculated on the old data match those of the original embedding.

Numeric matrix of t-SNE co-ordinates resulting from embedding new into the t-SNE embedding x.

Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets. Belkina, A.C., Ciccolella, C.O., Anno, R. et al. Nature Communications 10, 5415 (2019). doi: https://doi.org/10.1038/s41467-019-13055-y

 set.seed(42)
 m <- matrix(rnorm(2000), ncol=20)
 out_binding <- fitsne(m[-(1:2), ], random_state = 42L)
 new_points <- project(out_binding, new = m[1:2, ], old = m[-(1:2), ])
 plot(as.matrix(out_binding), col = "black", pch = 19,
     xlab = "t-SNE 1", ylab = "t-SNE 2")
 points(new_points, col = "red", pch = 19)