fftRtsne: FIt-SNE Based on Kluger Lab FIt-SNE

Description Usage Arguments References See Also

View source: R/FIt-SNE.R

Description

Modified version of https://github.com/KlugerLab/FIt-SNE/ Please do not directly call this function as it requires compiled FIt-SNE code to run. If you want to run FIt-SNE, please have a look at run.fitsne function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
fftRtsne(
  X,
  dims = 2,
  perplexity = 30,
  theta = 0.5,
  max_iter = 750,
  fft_not_bh = TRUE,
  ann_not_vptree = TRUE,
  stop_early_exag_iter = 250,
  exaggeration_factor = 12,
  no_momentum_during_exag = FALSE,
  start_late_exag_iter = -1,
  late_exag_coeff = 1,
  mom_switch_iter = 250,
  momentum = 0.5,
  final_momentum = 0.8,
  learning_rate = "auto",
  n_trees = 50,
  search_k = -1,
  rand_seed = -1,
  nterms = 3,
  intervals_per_integer = 1,
  min_num_intervals = 50,
  K = -1,
  sigma = -30,
  initialization = "pca",
  max_step_norm = 5,
  data_path = NULL,
  result_path = NULL,
  load_affinities = NULL,
  fast_tsne_path = NULL,
  nthreads = 0,
  perplexity_list = NULL,
  get_costs = FALSE,
  df = 1
)

Arguments

X

No default. Matrix containing the data which dimension need to be reduced.

perplexity

Default = 30. Perplexity is used to determine the bandwidth of the Gaussian kernel in the input space

theta

Default = 0.5. For exact t-SNE, set to 0. If non-zero, then will use either Barnes Hut or FIt-SNE based on nbody_algo. If Barnes Hut, then this determines the accuracy of BH approximation.

max_iter

Default = 750. Number of iterations of t-SNE to run.

fft_not_bh

Default = TRUE. If theta is nonzero, this determines whether to use FIt-SNE or Barnes Hut approximation.

ann_not_vptree

Default = TRUE. Use vp-trees (as in bhtsne) or approximate nearest neighbors (default). Set to be TRUE for approximate nearest neighbors.

stop_early_exag_iter

Default = 250. When to switch off early exaggeration.

exaggeration_factor

Default = 12. Coefficient for early exaggeration (>1).

no_momentum_during_exag

Default = FALSE. Set to 0 to use momentum and other optimization tricks. Can be set to 1 to do plain, vanilla gradient descent (useful for testing large exaggeration coefficients).

start_late_exag_iter

Default = -1. When to start late exaggeration. Set to -1 by default to not use late exaggeration.

late_exag_coeff

Default = 1. Late exaggeration coefficient. Set to 1 by default to not use late exaggeration.

mom_switch_iter

Default = 250. Iteration number to switch from momentum to final_momentum.

momentum

Default = 0.5.Initial value of momentum.

final_momentum

Default = 0.8. Value of momentum to use later in the optimisation.

learning_rate

Default = 'auto'. Set to desired learning rate or 'auto', which sets learning rate to N/exaggeration_factor where N is the sample size, or to 200 if N/exaggeration_factor < 200.

n_trees

Default = 50. When using Annoy, the number of search trees to use.

search_k

Default = -1. When using Annoy, the number of nodes to inspect during search. Default is -1 which translate to 3perplexityn_trees (or K*n_trees when using fixed sigma).

rand_seed

Default = -1. Seed for random initialisation. Set to -1 by default to initialise random number generator with current time.

nterms

Default = 3. If using FIt-SNE, this is the number of interpolation points per sub-interval.

intervals_per_integer

Default = 1. See min_num_intervals.

min_num_intervals

Default = 50. Let maxloc = ceil(max(max(X))) and minloc = floor(min(min(X))). i.e. the points are in a minloc^no_dims by maxloc^no_dims interval/square. The number of intervals in each dimension is either min_num_intervals or ceil((maxloc - minloc)/intervals_per_integer), whichever is larger. min_num_intervals must be an integer >0, and intervals_per_integer must be >0. Defaults are min_num_intervals=50 and intervals_per_integer = 1.

K

Default = -1. Number of nearest neighbours to get when using fixed sigma.

sigma

Default = -30. Fixed sigma value to use when perplexity==-1.

initialization

Default = 'pca'. pca', 'random', or N x no_dims array to intialize the solution.

max_step_norm

Default = 5. Maximum distance that a point is allowed to move on one iteration. Larger steps are clipped to this value. This prevents possible instabilities during gradient descent. Set to -1 to switch it off.

load_affinities

Default = NULL. If 1, input similarities are loaded from a file and not computed. If 2, input similarities are saved into a file. If 0, affinities are neither saved nor loaded.

fast_tsne_path

Default = NULL. Path to FItSNE executable.

nthreads

Default = 0. Number of threads to use, set to use all available threads by default.

perplexity_list

Default = NULL. If perplexity==0 then perplexity combination will be used with values taken from perplexity_list.

get_costs

Default = FALSE. Logical indicating whether the KL-divergence costs computed every 50 iterations should be returned.

df

Default = 1.0. Positive numeric that controls the degree of freedom of t-distribution. The actual degree of freedom is 2*df-1. The standard t-SNE choice of 1 degree of freedom corresponds to df=1. Large df approximates Gaussian kernel. df<1 corresponds to heavier tails, which can often resolve substructure in the embedding. See Kobak et al. (2019) for details.

dim

Default = 2. Dimensionality of the embedding (reduced data).

References

Linderman, G., Rachh, M., Hoskins, J., Steinerberger, S., Kluger., Y. (2019). Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data. Nature Methods. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6402590/.

See Also

run.fitsne


sydneycytometry/Spectre documentation built on March 20, 2021, 2:15 a.m.