README.md

sneer

Stochastic Neighbor Embedding Experiments in R

Note: This package is unlikely to see further major updates, but much of it lives on in smallvis.

An R package for experimenting with dimensionality reduction techniques, including the popular t-Distributed Stochastic Neighbor Embedding (t-SNE).

Installing

# install.packages("devtools")
devtools::install_github("jlmelville/sneer")

Documentation

package?sneer
# sneer function knows how to do lots of embedding
?sneer

Also see the (currently under-construction) documentation web pages for a more detailed explanation.

Examples

# t-SNE on the iris dataset:
res <- sneer(iris)
# then do what you want with the embedded coordinates in res$coords

# sneer does t-SNE, looks for numeric columns and a factor column to color 
# points with automatically, and does tSNE by default, but you can get specific:
res <- sneer(iris[, 1:4], labels = iris$Species, method = "tsne", 
             scale_type = "tsne", opt = "tsne", init = "r", 
             exaggerate = 4, exaggerate_off_iter = 100,
             perplexity = 25)

There is a section of the documentation that has (many) more examples.

Motivation

There are a lot of dimensionality reduction techniques out there, and many that take inspiration from t-SNE, but understanding what makes them work (or not) is complicated by the differences in dataset preparation, preprocessing, output initialization, optimization, and other heuristics.

Sneer is my attempt to write a package that not only provides a way to run multiple embedding algorithms with complete control over all the various twiddly bits, but also exposed lots of twiddly bits to twiddle on if that was what you wanted to do (and I do).

Its basic code was based heavily on Justin Donaldson's tsne R package, but is now mangled so far beyond its original form that I've made it a separate project rather than a fork. It does, however, inherit its license (GPL-2 or later).

Features

Currently sneer offers:

Limitations and Issues

Consider this package designed for experimenting on smaller datasets, not production-readiness.

Also, fitting everything I wanted to do into one package has involved splitting everything up into lots of little functions, so good luck finding where anything actually gets done. Thus, its pedagogical value is negligible, unless you were looking for an insight into my questionable design, naming and decision making skills. But this is a hobby project, so I get to make it as over-engineered as I want.

See also

I have some other packages that create or download datasets often used in SNE-related research:

Acknowledgements

I reverse engineered some specifics of the Spectral Directions gradient by translating the relevant part of the Matlab implementation provided on the Carreira-Perpiñán group's software page. Professor Carreira-Perpiñán kindly agreed to allow the resulting R code to be under the GPL license of this package. Obviously, assume any mistakes, errors or resulting destruction of your computer is a bug in sneer.

License

GPLv2 or later. The optimization part of sneer is provided by the mize package, which is available under the BSD 2-Clause license.



jlmelville/sneer documentation built on Nov. 15, 2022, 8:13 a.m.