fftRtsne: FIt-SNE Based on Kluger Lab FIt-SNE

View source: R/FIt-SNE.R

fftRtsneR Documentation

FIt-SNE Based on Kluger Lab FIt-SNE

Description

Modified version of https://github.com/KlugerLab/FIt-SNE/ implementation to expose argument names and defaults within CytoExploreR. This function should not be used directly, data should instead be mapped using cyto_map.

Usage

fftRtsne(
  X,
  dims = 2,
  perplexity = 30,
  theta = 0.5,
  max_iter = 750,
  fft_not_bh = TRUE,
  ann_not_vptree = TRUE,
  stop_early_exag_iter = 250,
  exaggeration_factor = 12,
  no_momentum_during_exag = FALSE,
  start_late_exag_iter = -1,
  late_exag_coeff = 1,
  mom_switch_iter = 250,
  momentum = 0.5,
  final_momentum = 0.8,
  learning_rate = "auto",
  n_trees = 50,
  search_k = -1,
  rand_seed = -1,
  nterms = 3,
  intervals_per_integer = 1,
  min_num_intervals = 50,
  K = -1,
  sigma = -30,
  initialization = "pca",
  max_step_norm = 5,
  load_affinities = NULL,
  fast_tsne_path = NULL,
  nthreads = 0,
  perplexity_list = NULL,
  get_costs = FALSE,
  df = 1
)

Arguments

X

a matrix containing the data to be mapped.

dims

dimensionality of the embedding, set to 2 by default.

perplexity

used to determine the bandwidth of the Gaussian kernel in the input space, set to 30 by default.

theta

set to 0 for exact t-SNE. If non-zero, then will use either Barnes Hut or FIt-SNE based on nbody_algo. If Barnes Hut, then this determins the accuracy of BH approximation. Set to 0.5 by default.

max_iter

number of iterations of t-SNE to run, set to 750 by default.

fft_not_bh

if theta is nonzero, this determins whether to use FIt-SNE or Barnes Hut approximation, set to TRUE by default for FIt-SNE.

ann_not_vptree

use vp-trees (as in bhtsne) or approximate nearest neighbors (default). Set to be TRUE for approximate nearest neighbors.

stop_early_exag_iter

when to switch off early exaggeration, set to 250 by default.

exaggeration_factor

coefficient for early exaggeration (>1), set to 12 by default.

no_momentum_during_exag

set to 0 to use momentum and other optimization tricks. Can be set to 1 to do plain, vanilla gradient descent (useful for testing large exaggeration coefficients).

start_late_exag_iter

when to start late exaggeration, set to -1 by default to not use late exaggeration.

late_exag_coeff

late exaggeration coefficient, set to 1 by default to not use late exaggeration.

mom_switch_iter

iteration number to switch from momentum to final_momentum, set to 250 by default.

momentum

initial value of momentum, set to 0.5 by default.

final_momentum

value of momentum to use later in the optimisation, set to 0.8 by default.

learning_rate

set to desired learning rate or 'auto', which sets learning rate to N/exaggeration_factor where N is the sample size, or to 200 if N/exaggeration_factor < 200.

n_trees

when using Annoy, the number of search trees to use, set to 50 by default.

search_k

When using Annoy, the number of nodes to inspect during search. Default is 3*perplexity*n_trees (or K*n_trees when using fixed sigma).

rand_seed

seed for random initialisation, set to -1 by default to initialise random number generator with current time.

nterms

if using FIt-SNE, this is the number of interpolation points per sub-interval, set to 3 by default.

intervals_per_integer

see min_num_intervals.

min_num_intervals

let maxloc = ceil(max(max(X))) and minloc = floor(min(min(X))). i.e. the points are in a [minloc]^no_dims by [maxloc]^no_dims interval/square. The number of intervals in each dimension is either min_num_intervals or ceil((maxloc - minloc)/intervals_per_integer), whichever is larger. min_num_intervals must be an integer >0, and intervals_per_integer must be >0. Defaults are min_num_intervals=50 and intervals_per_integer = 1.

K

number of nearest neighbours to get when using fixed sigma, set to -1 by default.

sigma

fixed sigma value to use when perplexity==-1, set to -30 by default.

initialization

'pca', 'random', or N x no_dims array to intialize the solution, set to 'pca' by default.

max_step_norm

maximum distance that a point is allowed to move on one iteration. Larger steps are clipped to this value. This prevents possible instabilities during gradient descent. Set to -1 to switch it off. Set to 5 by default.

load_affinities

if 1, input similarities are loaded from a file and not computed. If 2, input similarities are saved into a file. If 0, affinities are neither saved nor loaded.

fast_tsne_path

path to FItSNE executable.

nthreads

number of threads to use, set to use all available threads by default.

perplexity_list

if perplexity==0 then perplexity combination will be used with values taken from perplexity_list. Default: NULL df - Degree of freedom of t-distribution, must be greater than 0. Values smaller than 1 correspond to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details. Default is 1.0.

get_costs

logical indicating whether the KL-divergence costs computed every 50 iterations should be returned, set to FALSE by default.

df

positive numeric that controls the degree of freedom of t-distribution. The actual degree of freedom is 2*df-1. The standard t-SNE choice of 1 degree of freedom corresponds to df=1. Large df approximates Gaussian kernel. df<1 corresponds to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details. Default is 1.0.

References

Linderman, G., Rachh, M., Hoskins, J., Steinerberger, S., Kluger., Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6402590/.

See Also

cyto_map


DillonHammill/CytoExploreR documentation built on March 2, 2023, 7:34 a.m.