run.fitsne: Run FIt-SNE, Fourier Transform TSNE.
In ImmuneDynamics/Spectre: High-dimensional cytometry and imaging analysis

run.fitsne

R Documentation

Run FIt-SNE, Fourier Transform TSNE.

Description

Implementation of FIt-SNE is available from https://github.com/KlugerLab/FIt-SNE. This function uses fftRtsne to run FIt-SNE.

Usage

run.fitsne(dat, use.cols, seed = 42, fitsne.x.name = "FItSNE_X", 
fitsne.y.name = "FItSNE_Y", dims = 2, perplexity = 30, theta = 0.5, 
max_iter = 750, fft_not_bh = TRUE, ann_not_vptree = TRUE,
  stop_early_exag_iter = 250, exaggeration_factor = 12.0, 
  no_momentum_during_exag = FALSE,start_late_exag_iter = -1, 
  late_exag_coeff = 1.0, mom_switch_iter = 250, momentum = 0.5, 
  final_momentum = 0.8, learning_rate = 'auto', n_trees = 50, 
  search_k = -1, nterms = 3, intervals_per_integer = 1, 
  min_num_intervals = 50, K = -1, sigma = -30, initialization = 'pca',
  max_step_norm = 5, load_affinities = NULL, fast_tsne_path = NULL, 
  nthreads = 0, perplexity_list = NULL, get_costs = FALSE,  df = 1.0)

Arguments

`dat`	NO DEFAULT. Input data.table or data.frame.
`use.cols`	NO DEFAULT. Vector of column names or numbers for clustering.
`seed`	Default = 42. Seed value for reproducibility.
`fitsne.x.name`	Default = "FItSNE_X". Character. Name of FItSNE x-axis.
`fitsne.y.name`	Default = "FItSNE_Y". Character. Name of FItSNE y-axis.
`dims`	Default = 2. Dimensionality of the embedding (reduced data).
`perplexity`	Default = 30. Perplexity is used to determine the bandwidth of the Gaussian kernel in the input space
`theta`	Default = 0.5. For exact t-SNE, set to 0. If non-zero, then will use either Barnes Hut or FIt-SNE based on nbody_algo. If Barnes Hut, then this determines the accuracy of BH approximation.
`max_iter`	Default = 750. Number of iterations of t-SNE to run.
`fft_not_bh`	Default = TRUE. If theta is nonzero, this determines whether to use FIt-SNE or Barnes Hut approximation.
`ann_not_vptree`	Default = TRUE. Use vp-trees (as in bhtsne) or approximate nearest neighbors (default). Set to be TRUE for approximate nearest neighbors.
`stop_early_exag_iter`	Default = 250. When to switch off early exaggeration.
`exaggeration_factor`	Default = 12. Coefficient for early exaggeration (>1).
`no_momentum_during_exag`	Default = FALSE. Set to 0 to use momentum and other optimization tricks. Can be set to 1 to do plain, vanilla gradient descent (useful for testing large exaggeration coefficients).
`start_late_exag_iter`	Default = -1. When to start late exaggeration. Set to -1 by default to not use late exaggeration.
`late_exag_coeff`	Default = 1. Late exaggeration coefficient. Set to 1 by default to not use late exaggeration.
`mom_switch_iter`	Default = 250. Iteration number to switch from momentum to final_momentum.
`momentum`	Default = 0.5.Initial value of momentum.
`final_momentum`	Default = 0.8. Value of momentum to use later in the optimisation.
`learning_rate`	Default = 'auto'. Set to desired learning rate or 'auto', which sets learning rate to N/exaggeration_factor where N is the sample size, or to 200 if N/exaggeration_factor < 200.
`n_trees`	Default = 50. When using Annoy, the number of search trees to use.
`search_k`	Default = -1. When using Annoy, the number of nodes to inspect during search. Default is -1 which translate to 3perplexityn_trees (or K*n_trees when using fixed sigma).
`nterms`	Default = 3. If using FIt-SNE, this is the number of interpolation points per sub-interval.
`intervals_per_integer`	Default = 1. See min_num_intervals.
`min_num_intervals`	Default = 50. Let maxloc = ceil(max(max(X))) and minloc = floor(min(min(X))). i.e. the points are in a minloc^no_dims by maxloc^no_dims interval/square. The number of intervals in each dimension is either min_num_intervals or ceil((maxloc - minloc)/intervals_per_integer), whichever is larger. min_num_intervals must be an integer >0, and intervals_per_integer must be >0. Defaults are min_num_intervals=50 and intervals_per_integer = 1.
`K`	Default = -1. Number of nearest neighbours to get when using fixed sigma.
`sigma`	Default = -30. Fixed sigma value to use when perplexity==-1.
`initialization`	Default = 'pca'. pca', 'random', or N x no_dims array to intialize the solution.
`max_step_norm`	Default = 5. Maximum distance that a point is allowed to move on one iteration. Larger steps are clipped to this value. This prevents possible instabilities during gradient descent. Set to -1 to switch it off.
`load_affinities`	Default = NULL. If 1, input similarities are loaded from a file and not computed. If 2, input similarities are saved into a file. If 0, affinities are neither saved nor loaded.
`fast_tsne_path`	Default = NULL. Path to FItSNE executable.
`nthreads`	Default = 0. Number of threads to use, set to use all available threads by default.
`perplexity_list`	Default = NULL. If perplexity==0 then perplexity combination will be used with values taken from perplexity_list.
`get_costs`	Default = FALSE. Logical indicating whether the KL-divergence costs computed every 50 iterations should be returned.
`df`	Default = 1.0. Positive numeric that controls the degree of freedom of t-distribution. The actual degree of freedom is 2*df-1. The standard t-SNE choice of 1 degree of freedom corresponds to df=1. Large df approximates Gaussian kernel. df<1 corresponds to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details.

Author(s)

Givanna Putri

Examples

dat <- Spectre::demo.clustered
dat.sub <- Spectre::do.subsample(dat, 30000)
use.cols <- names(dat)[12:19]
dat.reduced <- run.fitsne(dat = dat.sub, use.cols = use.cols)

ImmuneDynamics/Spectre documentation built on Oct. 12, 2024, 7:55 p.m.