sneer | R Documentation |
A package for exploring probability-based embedding and related forms of dimensionality reduction. Its main goal is to implement multiple embedding methods within a single framework so comparison between them is easier, without worrying about the effect of differences in preprocessing, optimization and heuristics.
Carries out an embedding of a high-dimensional dataset into a two dimensional scatter plot, based on distance-based methods (e.g. Sammon maps) and probability-based methods (e.g. t-distributed Stochastic Neighbor Embedding).
sneer(df, indexes = NULL, ndim = 2, method = "tsne", alpha = 0.5, dof = 10, dyn = c(), lambda = 0.5, kappa = 0.5, scale_type = "none", perplexity = 32, perp_scale = "single", perp_scale_iter = NULL, perp_kernel_fun = "exp", prec_scale = "none", init = "pca", opt = "L-BFGS", eta = 1, max_iter = 1000, max_fn = Inf, max_gr = Inf, max_fg = Inf, report_every = 50, tol = 1e-04, exaggerate = NULL, exaggerate_off_iter = 100, plot_type = "plot", colors = NULL, color_name = NULL, labels = NULL, label_name = NULL, label_chars = NULL, point_size = 1, plot_labels = FALSE, color_scheme = grDevices::rainbow, equal_axes = FALSE, legend = TRUE, legend_rows = NULL, quality_measures = NULL, ret = c())
df |
Data frame or distance matrix (as dist object) to embed. |
indexes |
Indexes of the columns of the numerical variables to use in
the embedding. The default of |
ndim |
Number of output dimensions (normally 2). |
method |
Embedding method. See 'Details'. |
alpha |
Heavy tailedness parameter. Used only if the method is
|
dof |
Initial number of degrees of freedom. Used only if the method is
|
dyn |
List containing kernel parameters to be optimized. See "Details". |
lambda |
NeRV parameter. Used only if the method is |
kappa |
JSE parameter. Used only if the method is |
scale_type |
Type of scaling to carry out on the input data. See 'Details'. |
perplexity |
Target perplexity or vector of trial perplexities (if
|
perp_scale |
Type of perplexity scaling to apply. See 'Details'. Ignored by non-probability based methods. |
perp_scale_iter |
Number of iterations to scale perplexity values over.
Must be smaller than the |
perp_kernel_fun |
The input data weight function. Either |
prec_scale |
Whether to scale the output kernel precision based on
perplexity results. See 'Details'. Ignored by non-probability based methods.
Can't be used if |
init |
Type of initialization of the output coordinates. See 'Details'. |
opt |
Type of optimizer. See 'Details'. |
eta |
Learning rate when |
max_iter |
Maximum number of iterations to carry out during the
embedding. Ignored if the |
max_fn |
Maximum number of cost function evaluations to carry out
during the embedding. Ignored if the |
max_gr |
Maximum number of cost function evaluations to carry out
during the embedding. Ignored if the |
max_fg |
Maximum number of the total of the cost function and gradient
evaluations to carry out during the embedding. Ignored if the |
report_every |
Frequency (in terms of iteration number) with which to update plot and report the cost function. |
tol |
Tolerance for comparing cost change (calculated according to the
interval determined by |
exaggerate |
If non- |
exaggerate_off_iter |
Iteration number to stop the "early exaggeration"
scaling specified |
plot_type |
String code indicating the type of plot of the embedding
to display: |
colors |
Vector of colors to use to color each point in the embedding plot. |
color_name |
Name of column of colors in |
labels |
Factor vector associated with (but not necessarily in)
|
label_name |
Name of a factor column in |
label_chars |
Number of characters to use for the labels in the
embedding plot. Applies only when |
point_size |
Size of the points (or text) in the embedding plot. |
plot_labels |
If |
color_scheme |
Either a color ramp function, or the name of a Color
Brewer palette name to use for mapping the factor specified by
|
equal_axes |
If |
legend |
if |
legend_rows |
Number of rows to use for displaying the legend in
an embedding plot. Applies when |
quality_measures |
Vector of names of quality measures to apply to the finished embedding. See 'Details'. Values of the quality measures will be printed to screen after embedding and retained in the list that is returned from this function. |
ret |
Vector of names of extra data to return from the embedding. See 'Details'. |
The embedding methods available are:
"pca"
The first two principal components.
"mmds"
Metric multidimensional scaling.
"sammon"
Sammon map.
"tsne"
t-Distributed Stochastic Neighbor Embedding of van der
Maaten and Hinton (2008).
"asne"
Asymmetric Stochastic Neighbor Embedding of Hinton and
Roweis (2002).
"ssne"
Symmetric Stochastic Neighbor Embedding of Cook et al
(2007).
"wssne"
Weighted Symmetric Stochastic Neighbor Embedding of
Yang et al (2014). Note that despite its name this version is a
modification of t-SNE, not SSNE.
"hssne"
Heavy-tailed Symmetric Stochastic Neighbor Embedding of
Yang et al (2009).
"nerv"
Neighbor Retrieval Visualizer of Venna et al (2010).
NB: The original paper suggests setting the output weight function
precisions to be equal to those of the input weights. Later papers don't
mention this. For consistency with other embedding methods, the default
behavior is not to transfer the precisions to the output function.
To transfer precisions, set prec_scale = "transfer"
.
"jse"
Jensen-Shannon Embedding of Lee at al (2013).
"itsne"
Inhomogeneous t-SNE method of Kitazono et al (2016).
"dhssne"
A "dynamic" version of HSSNE, inspired by the
inhomogeneous t-SNE Method of Kitazono et al.
Custom embedding methods can also be used, via the embedder
function.
The "dyn"
parameter allows for kernel parameters to be optimized, if
the output kernel is exponential or heavy-tailed, i.e. methods asne
,
ssne
, nerv
and jse
(which use the exponential kernel)
and hssne
(which uses the heavy-tailed kernel). The parameter
should be a list consisting of the following names:
For exponential kernels, "beta"
(the precision of the
exponential.)
For the heavy-tailed kernel, "alpha"
(the heavy-tailedness),
and "beta"
(analogous to the precision of the exponential).
alt_opt
If TRUE
, then optimize non-coordinates
separately from coordinates.
"kernel_opt_iter"
Wait this number of iterations before
beginning to optimize non-coordinate parameters.
The values of the list "beta"
and "alpha"
items should be one
of:
"global"
The parameter is the same for every point.
"point"
The value is applied per point, and can be different
for each point.
"static"
The value is fixed at its initial value and is not
optimized.
Setting a value to "static"
only makes sense for kernels where there
is more than one parameter that could be optimized and you don't want all of
them optimized (e.g. you may only want to optimize alpha in the heavy-tailed
kernel). It's an error to specify all parameters as "static"
.
The methods "dhssne"
and "itsne"
already use dynamic kernel
optimization and don't require any further specification, but specifying the
alt_opt
and kernel_opt_iter
list members will affect their
behavior.
The following scaling options can be applied via the scale_type
parameter:
"none"
Do nothing. The default.
"matrix"
Range scale the entire data so that the maximum value
is 1 and the minimum 0.
"range"
Range scale each column that the maximum value in each
column is 1 and the minimum 0.
"sd"
Scale each column so that its mean is 0 and standard
deviation is 1.
"tsne"
Center each column, then scale each element by the
absolute maximum element value. This is the scaling carried out in
Barnes-Hut t-SNE.
These arguments can be abbreviated. Default is to do no scaling. Zero variance columns will be removed even if no preprocessing is carried out.
The perplexity
parameter is used in combination with the
perp_scale
parameter, which can take the following values:
"single"
perplexity
should be a single value, which
will be used over the entire course of the embedding.
"step"
perplexity
should be a vector of
perplexity values. Each perplexity will be used in turn over the course
of the embedding, in sequential order. By starting with a large
perplexity, and ending with the desired perplexity, it has been
suggested by some researchers that local minima can be avoided.
"multi"
The multiscaling method of Lee et al (2015).
perplexity
should be a vector of perplexity values. Each
perplexity will be used in turn over the course of the embedding, in
sequential order. Unlike with the "step"
method, probability
matrices from earlier perplexities are retained and combined by
averaging. N.B. Multiscaling is not compatible with
method
s "itsne"
or "dhssne"
.
These arguments can be abbreviated.
For perp_scale
values that aren't "single"
, if a non-vector
argument is suppied to the perplexity
argument, it will be ignored,
and a suitable vector of perplexity values will be used instead. For
"multi"
these will range from the the number of observations in the
dataset divided by four down to 2, in descending powers of 2. For
"step"
, 5 equally spaced values ranging from the number of
observations divided by 2 down to 32 (or the number of observations divided
by 4, if the dataset is smaller than 65 observations.)
The prec_scale
parameter determines if the input weighting kernel
precision parameters should be used to modify the output kernel parameter
after the input probability calculation for a given perplexity value
completes values are:
"none"
Do nothing. Most embedding methods follow this strategy,
leaving the output similarity kernels to all have unit precision.
"transfer"
Transfer the input similarity kernel parameters to the
output similarity kernel. This method was suggesed by Venna et al (2010).
This is only compatible with methods "asne"
, "jse"
and
"nerv"
.
"scale"
Scale the output kernel precisions based on the target
perplexity
and the intrinsic dimensionality of the input data. This
method is part of the multiscaling technique proposed by Lee et al (2015).
These arguments can be abbreviated.
The prec_scale
parameter will be ignored if the method
used
does not use an output similarity kernel with a free parameter, e.g.
tsne
or wtsne
. Also, because the input and output similarity
kernels must be of the same type, prec_scale
is incompatible with
setting perp_kernel_fun
to "step".
For initializing the output coordinates, the options for the
init
parameter are:
"pca"
Initialize using the first two scores of the PCA (using
classical MDS if df
is a distance matrix). Data will be centered,
but not scaled unless the scale_type
parameter is used.
"random"
Initialize each coordinate value from a normal random
distribution with a standard deviation of 1e-4, as suggested by van der
Maaten and Hinton (2008).
"uniform"
Initialize each coordinate value from a uniform random
distribution between 0 and 1 as suggested by Venna et al (2010).
Coordinates may also be passed directly as a matrix
. The
dimensions must be correct for the input data.
Character arguments can be abbreviated.
For configuring the optimization method, the options for the opt
parameter are:
"TSNE"
The optimization method used in the original t-SNE
paper: the Jacobs method for step size selection and a step function
for the momentum: switching from 0.4 to 0.8 after 250 steps. You may need
to modify the "eta"
parameter to get good results, depending
on how you have scaled and preprocessed your data, and the embedding
method used.
"BFGS"
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method.
Requires storing an approximation to the Hessian, so not good for large
datasets.
"L-BFGS"
The limited-memory BFGS method (using the last ten
updates). Default method.
"NEST"
Momentum emulating Nesterov Accelerated Gradient
(Sutskever and co-workers 2013).
"CG"
Conjugate Gradient.
"SPEC"
Spectral Direction partial Hessian method of Vladymyrov
and Carreira-Perpinan (2012). Requires a probability-based embedding
method and that the input probability matrix be symmetric. Some
probability-based methods are not compatible (e.g. NeRV and JSE; t-SNE
works with it, however). Also, while it works with the dense matrices used
by sneer, because this method uses a Cholesky decomposition of the input
probability matrix which has a complexity of O(N^3), it is intended
to be used with sparse matrices. Its inclusion here is suitable for use
with smaller datasets.
For the quality_measures
argument, a vector with one or more of the
following options can be supplied:
"rocauc"
Calculate the area under the ROC curve, averaged over
each observation, using the output distance matrix to rank each
observation. Observations are partitioned into the positive and negative
class depending upon the value of the label determined by the
label_name
argument. Only calculated if the label_name
parameter is supplied.
"prauc"
Calculate the area under the Precision-Recall curve.
Only calculated if the label_name
parameter is supplied.
"rnxauc"
Calculate the area under the RNX curve, using the
method of Lee et al (2015).
Options may be abbreviated.
Progress of the embedding is logged to the standard output every 50 iterations. The raw cost of the embedding will be provided along with some tolerances of either how the embedding or the cost has changed.
Because the different costs are not always scaled in a way that makes it obvious how well the embedding has performed, a normalized cost is also shown, where 0 is the minimum possible cost (coinciding with the probabilities or distances in the input and output space being matched), and a normalized cost of 1 is what you would get if you just set all the distances and probabilities to be equal to each other (i.e. ignoring any information from the input space).
Also, the embedding will be plotted. Plotting can be done
with either the standard plot
function (the default
or by explicitly providing plot_type = "plot"
) or with the
ggplot2
library (which you need to install and load yourself), by
using plot_type = "ggplot2"
(you may abbreviate these arguments). The
goal has been to provide enough customization to give intelligible results
for most datasets. The following are things to consider:
The plot symbols are normally filled circles. However, if you
set the plot_text
argument to TRUE
, the labels
argument can be used to provide a factor vector that provides a meaningful
label for each data point. In this case, the text of each factor level will
be used as a level. This creates a mess with all but the shortest labels
and smallest datasets. There's also a label_fn
parameter that lets
you provide a function to convert the vector of labels to a different
(preferably shorter) form, but you may want to just do it yourself ahead
of time and add it to the data frame.
Points are colored using two strategies. The most straightforward way
is to provide a vector of rgb color strings as an argument to colors
.
Each element of colors
will be used to color the equivalent point
in the data frame. Note, however, this is currently ignored when plotting
with ggplot2.
The second way to color the embedding plot uses the labels
parameter mentioned above. Each level of the factor used for labels
will be mapped to a color and that used to color each point. The mapping
is handled by the color_scheme
parameter. It can be either a color
ramp function like rainbow
or the name of a color
scheme in the RColorBrewer
package (e.g. "Set3"
). The latter
requires the RColorBrewer
package to have been installed and loaded.
Unlike with using colors
, providing a labels
argument works
with ggplot2 plots. In fact, you may find it preferable to use ggplot2,
because if the legend
argument is TRUE
(the default), you
will get a legend with the plot. Unfortunately, getting a legend with an
arbitary number of elements to fit on an image created with the
graphics::plot
function and for it not to obscure the points proved
beyond my capabilities. Even with ggplot2, a dataset with a large number
of categories can generate a large and unwieldy legend.
Additionally, instead of providing the vectors directly, there are
color_name
and label_name
arguments that take a string
containing the name of a column in the data frame, e.g. you can use
labels = iris$Species
or label_name = "Species"
and get the
same result.
If you don't care that much about the colors, provide none of these options
and sneer will try and work out a suitable column to use. If it finds at
least one color column in the data frame (i.e. a string column where every
element can be parsed as a color), it will use the last column found as
if you had provided it as the colors
argument.
Otherwise, it will repeat the process but looking for a vector of factors.
If it finds one, it will map it to colors via the color_scheme
, just
as if you had provided the labels
argument. The default color scheme
is to use the rainbow
function so you should normally get a colorful,
albeit potentially garish, result.
For the ret
argument, a vector with one or more of the
following options can be supplied:
"pcost"
The final cost function value, decomposed into n
contributions, where n is the number of points embedded.
"x"
The input coordinates after scaling and column filtering.
"dx"
The input distance matrix. Calculated if not present.
"dy"
The output distance matrix. Calculated if not present.
"p"
The input probability matrix.
"q"
The output probability matrix.
"prec"
The input kernel precisions (inverse of the squared
bandwidth).
"dim"
The intrinsic dimensionality for each observation,
calculated according to the method of Lee et al (2015). These are
meaningless if not using the default exponential perp_kernel_fun
.
"deg"
Degree centrality of the input probability. Calculated
if not present.
"dyn"
A list of "dynamic" parameters, i.e. any non-coordinate
parameters which were optimized. Only used if the "dyn"
input
parameter was non-NULL
. The list will contain the value of the
optimized parameters. If the "alt_opt"
flag was set in the
"dyn"
input list, then this return list will also contain the number
of cost function and gradient evaluations associated with the optimization
of the parameters, as "nf"
and "ng"
, respectively.
"costs"
A matrix containing the costs and iteration at which
they were calculated, as reported during the optimization. The number of
these results is controlled by the report_every
parameter.
"nf"
The number of cost function evaluations carried out
during the optimization. If "dynamic" parameters were used and optimized
separately from coordinates, then this count does not include any
contribution from the parameter optimization. Those counts can be found in
the "dyn"
return list.
"ng"
The number of cost gradient evaluations carried out
during the optimization. If "dynamic" parameters were used and optimized
separately from coordinates, then this count does not include any
contribution from the parameter optimization. Those counts can be found in
the "dyn"
return list.
The color_scheme
parameter is used to set the color scheme for the
embedding plot that is displayed during the optimization. It can be one of
either a color ramp function (e.g. grDevices::rainbow
), accepting an
integer n as an argument and returning n colors, or the name of a ColorBrewer
color scheme (e.g. "Spectral"). Using a ColorBrewer scheme requires the
RColorBrewer
package be installed.
For some applicable color ramp functions, see the Palettes
help page
in the grDevices
package (e.g. by running the ?rainbow
command).
List with the following elements:
coords
Embedded coordinates.
cost
Cost function value for the embedded coordinates. The
type of the cost depends on the method, but the lower the better.
norm_cost
cost
, normalized so that a perfect embedding
gives a value of 0 and one where all the distances were equal would have
a value of 1.
iter
Iteration number when embedding terminated.
Additional elements will be in the list if ret
or
quality_measures
are non-empty.
The sneer
function provides a variety of methods for embedding,
including:
Stochastic Neighbor Embedding and variants (ASNE, SSNE and TSNE)
Metric MDS using the STRESS and SSTRESS functions
Sammon Mapping
Heavy-tailed Symmetric Stochastic Neighbor Embedding (HSSNE)
Neigbor Retrieval Visualizer (NeRV)
Jensen-Shannon Embedding (JSE)
Inhomogeneous t-SNE
See the documentation for the function for the exact list of methods
and variations. If you want to create variations on these methods by
trying different cost functions, weighting functions and normalization
schemes, see the embedder
function.
Optimization is carried out with the mize package (https://github.com/jlmelville/mize) with the limited memory BFGS. Other optimization methods include the Nesterov Accelerated Gradient method (Sutskever et al 2013) with an adaptive restart (O'Donoghue and Candes 2013), which is a bit more robust compared to the usual t-SNE optimization method across the different methods exposed by sneer.
The embed_plot
function will take the output of the
sneer
function and provide a visualization of the embedding.
If you install the RColorBrewer
package installed, you can use the
ColorBrewer palettes by name.
Some functions are available for attempting to quantify embedding quality,
independent of the particular loss function used for an embedding method.
The nbr_pres
function will measure how well the embedding
preserves a neighborhood of a given size around each observation. The
rnx_auc_embed
function implements the Area Under the Curve
of the RNX curve (Lee et al. 2015), which generalizes the neighborhood
preservation to account for all neighborhood sizes, with a bias towards
smaller neighborhoods.
If your observations have labels which could be used for a classification
task, then there are also functions which will use these labels to calculate
the Area Under the ROC or PR (Precision/Recall) Curve, using the embedded
distances to rank each observation: these are roc_auc_embed
and pr_auc_embed
functions, respectively. Note that to use
these two functions, you must have the PRROC
package installed.
There's a synthetic dataset in this package, called s1k
. It consists
of a 1000 points representing a fuzzy 9D simplex. It's intended to
demonstrate the "crowding effect" and require the sort of
probability-based embedding methods provided in this package (PCA does a
horrible job of separated the 10 clusters in the data). See s1k
for more details.
t-SNE, SNE and ASNE Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605).
NeRV Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451-490.
JSE Lee, J. A., Renard, E., Bernard, G., Dupont, P., & Verleysen, M. (2013). Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing, 112, 92-108.
Inhomogeneous t-SNE Kitazono, J., Grozavu, N., Rogovschi, N., Omori, T., & Ozawa, S. (2016, October). t-Distributed Stochastic Neighbor Embedding with Inhomogeneous Degrees of Freedom. In International Conference on Neural Information Processing (ICONIP 2016) (pp. 119-128). Springer International Publishing.
Nesterov Accelerated Gradient: Sutskever, I., Martens, J., Dahl, G., & Hinton, G. (2013). On the importance of initialization and momentum in deep learning. In Proceedings of the 30th international conference on machine learning (ICML-13) (pp. 1139-1147).
O'Donoghue, B., & Candes, E. (2013). Adaptive restart for accelerated gradient schemes. Foundations of computational mathematics, 15(3), 715-732.
Spectral Direction: Vladymyrov, M., & Carreira-Perpinan, M. A. (2012). Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) (pp. 345-352).
Cook, J., Sutskever, I., Mnih, A., & Hinton, G. E. (2007). Visualizing similarity data with a mixture of maps. In International Conference on Artificial Intelligence and Statistics (pp. 67-74).
Hinton, G. E., & Roweis, S. T. (2002). Stochastic neighbor embedding. In Advances in neural information processing systems (pp. 833-840).
Kitazono, J., Grozavu, N., Rogovschi, N., Omori, T., & Ozawa, S. (2016, October). t-Distributed Stochastic Neighbor Embedding with Inhomogeneous Degrees of Freedom. In International Conference on Neural Information Processing (ICONIP 2016) (pp. 119-128). Springer International Publishing.
Lee, J. A., Renard, E., Bernard, G., Dupont, P., & Verleysen, M. (2013). Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing, 112, 92-108.
Lee, J. A., Peluffo-Ordo'nez, D. H., & Verleysen, M. (2015). Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure. Neurocomputing, 169, 246-261.
Van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605).
Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451-490.
Vladymyrov, M., & Carreira-Perpinan, M. A. (2012). Partial-Hessian Strategies for Fast Learning of Nonlinear Embeddings. In Proceedings of the 29th International Conference on Machine Learning (ICML-12) (pp. 345-352).
Yang, Z., King, I., Xu, Z., & Oja, E. (2009). Heavy-tailed symmetric stochastic neighbor embedding. In Advances in neural information processing systems (pp. 2169-2177).
Yang, Z., Peltonen, J., & Kaski, S. (2014). Optimization equivalence of divergences improves neighbor embedding. In Proceedings of the 31st International Conference on Machine Learning (ICML-14) (pp. 460-468).
## Not run: # Do t-SNE on the iris dataset, scaling columns to zero mean and # unit standard deviation. res <- sneer(iris, scale_type = "sd") # Use the weighted TSNE variant and export the input and output distance # matrices. res <- sneer(iris, scale_type = "sd", method = "wtsne", ret = c("dx", "dy")) # calculate the 32-nearest neighbor preservation for each observation # 0 means no neighbors preserved, 1 means all of them pres32 <- nbr_pres(res$dx, res$dy, 32) # Calculate the Area Under the RNX Curve rnx_auc <- rnx_auc_embed(res$dx, res$dy) # Load the PRROC library library(PRROC) # Calculate the Area Under the Precision Recall Curve for the embedding pr <- pr_auc_embed(res$dy, iris$Species) # Similarly, for the ROC curve: roc <- roc_auc_embed(res$dy, iris$Species) # Load the RColorBrewer library library(RColorBrewer) # Plot the embedding, with points colored by the neighborhood preservation embed_plot(res$coords, x = pres32, color_scheme = "Blues") ## End(Not run) ## Not run: # PCA on iris dataset and plot result using Species label name res <- sneer(iris, indexes = 1:4, label_name = "Species", method = "pca") # Same as above, but with sensible defaults (use all numeric columns, plot # with first factor column found) res <- sneer(iris, method = "pca") # Can use a distance matrix as input with external vector of labels res <- sneer(dist(iris[1:4]), method = "pca", labels = iris$Species) # scale columns so each one has mean 0 and variance 1 res <- sneer(iris, method = "pca", scale_type = "sd") # full species name on plot is cluttered, so just use the first two # letters and half size res <- sneer(iris, method = "pca", scale_type = "sd", label_chars = 2, point_size = 0.5) library(ggplot2) library(RColorBrewer) # Use ggplot2 and RColorBrewer palettes for the plot res <- sneer(iris, method = "pca", scale_type = "sd", plot_type = "g") # Use a different ColorBrewer palette, bigger points, and range scale each # column res <- sneer(iris, method = "pca", scale_type = "r", plot_type = "g", color_scheme = "Dark2", point_size = 2) # metric MDS starting from the PCA res <- sneer(iris, method = "mmds", scale_type = "sd", init = "p") # Sammon map starting from random distribution res <- sneer(iris, method = "sammon", scale_type = "sd", init = "r") # TSNE with a perplexity of 32, initialize from PCA res <- sneer(iris, method = "tsne", scale_type = "sd", init = "p", perplexity = 32) # default settings are to use TSNE with perplexity 32 and initialization # from PCA so the following is the equivalent of the above res <- sneer(iris, scale_type = "sd") # Use the standard tSNE optimization method (Jacobs step size method) with # step momentum. Range scale the matrix and use an aggressive learning # rate (eta). res <- sneer(iris, scale_type = "m", perplexity = 25, opt = "tsne", eta = 500) # Use the L-BFGS optimization method res <- sneer(iris, scale_type = "sd", opt = "L-BFGS") # Use the Spectral Directions method res <- sneer(iris, scale_type = "sd", opt = "SPEC") # Use Conjugate Gradient res <- sneer(iris, scale_type = "sd", opt = "CG") # NeRV method, starting at a more global perplexity and slowly stepping # towards a value of 32 (might help avoid local optima) res <- sneer(iris, scale_type = "sd", method = "nerv", perp_scale = "step") # NeRV method has a lambda parameter - closer to 1 it gets, the more it # tries to avoid false positives (close points in the map that aren't close # in the input space): res <- sneer(iris, scale_type = "sd", method = "nerv", perp_scale = "step", lambda = 1) # Original NeRV paper transferred input exponential similarity kernel # precisions to the output kernel, and initialized from a uniform random # distribution res <- sneer(iris, scale_type = "sd", method = "nerv", perp_scale = "step", lambda = 1, prec_scale = "t", init = "u") # Like NeRV, the JSE method also has a controllable parameter that goes # between 0 and 1, called kappa. It gives similar results to NeRV at 0 and # 1 but unfortunately the opposite way round! The following gives similar # results to the NeRV embedding above: res <- sneer(iris, scale_type = "sd", method = "jse", perp_scale = "step", kappa = 0) # Rather than step perplexities, use multiscaling to combine and average # probabilities across multiple perplexities. Output kernel precisions # can be scaled based on the perplexity value (compare to NeRV example # which transferred the precision directly from the input kernel) res <- sneer(iris, scale_type = "sd", method = "jse", perp_scale = "multi", prec_scale = "s") # HSSNE has a controllable parameter, alpha, that lets you control how # much extra space to give points compared to the input distances. # Setting it to 1 is equivalent to TSNE, so 1.1 is a bit of an extra push: res <- sneer(iris, scale_type = "sd", method = "hssne", alpha = 1.1) # DHSSNE is a "dynamic" extension to HSSNE which will modify alpha from # its starting point, similar to how it-SNE works (except there's # only one global value being optimized) # Setting alpha simply chooses the initial value res <- sneer(iris, method = "dhssne", alpha = 0.5) # Can make other embedding methods "dynamic" in the style of it-SNE and # DSSNE. Here we let the ASNE output kernel have different precision # parameters: res <- sneer(iris, method = "asne", dyn = list(beta = "point")) # DHSSNE could be defined manually like this: alpha is optimized as a single # global parameter, while the beta parameters are not optimized res <- sneer(iris, method = "hssne", dyn = list(alpha = "global", beta = "static")) # Allow both alpha and beta in the heavy-tailed function to vary per-point: res <- sneer(iris, method = "hssne", dyn = list(alpha = "point", beta = "point")) # it-SNE has a similar degree of freedom parameter to HSSNE's alpha, but # applies independently to each point and is optimized as part of the # embedding. # Setting dof chooses the initial value (1 is like t-SNE, large values # approach ASNE) # kernel_opt_iter sets how many iterations with just coordinate # optimization before including dof optimization too. res <- sneer(iris, method = "itsne", dof = 10, dyn = list(kernel_opt_iter = 50)) # wTSNE treats the input probability like a graph where the probabilities # are weighted edges and adds extra repulsion to nodes with higher degrees res <- sneer(iris, scale_type = "sd", method = "wtsne") # can use a step-function input kernel to make input probability more like # a k-nearest neighbor graph (but note that we don't take advantage of the # sparsity for performance purposes, sadly) res <- sneer(iris, scale_type = "sd", method = "wtsne", perp_kernel_fun = "step") # Some quality measures are available to quantify embeddings # The area under the RNX curve measures whether neighbors in the input # are still neighors in the output space res <- sneer(iris, scale_type = "sd", method = "wtsne", quality_measures = c("rnxauc")) # Create a 5D gaussian with its own column specifying colors to use # for each point (in this case, random) g5d <- data.frame(matrix(rnorm(100 * 5), ncol = 5), color = rgb(runif(100), runif(100), runif(100)), stringsAsFactors = FALSE) # Specify the name of the color column and the plot will use it rather than # trying to map factor levels to colors res <- sneer(g5d, method = "pca", color_name = "color") # If your dataset labels divide the data into natural classes, can # calculate average area under the ROC and/or precision-recall curve too, # but you need to have installed the PRROC package. # All these techniques can be slow (scale with the square of the number of # observations). library(PRROC) res <- sneer(iris, scale_type = "sd", method = "wtsne", quality_measures = c("rnx", "roc", "pr")) # export the distance matrices and do whatever quality measures we # want at our leisure res <- sneer(iris, scale_type = "sd", method = "wtsne", ret = c("dx", "dy")) # Calculate the Area Under the Precision Recall Curve for the embedding pr <- pr_auc_embed(res$dy, iris$Species) # Similarly, for the ROC curve: roc <- roc_auc_embed(res$dy, iris$Species) # export per-point error, degree centrality, input weight function # precision parameters and intrinsic dimensionality res <- sneer(iris, scale_type = "sd", method = "wtsne", ret = c("pcost", "deg", "prec", "dim")) # Plot the embedding as points colored by category, using the rainbow # color ramp function: embed_plot(res$coords, iris$Species, color_scheme = rainbow) # Load the RColorBrewer Library library(RColorBrewer) # Use a ColorBrewer Qualitative color scheme name (pass a string, not # a function!) embed_plot(res$coords, iris$Species, color_scheme = "Dark2") # Visualize embedding colored by various values: # Per-point embedding error embed_plot(res$coords, x = res$pcost) # Degree centrality embed_plot(res$coords, x = res$deg) # Intrinsic Dimensionality using the PRGn palette embed_plot(res$coords, x = res$dim, color_scheme = "PRGn") # Input weight function precision parameter with the Spectral palette embed_plot(res$coords, x = res$prec, color_scheme = "Spectral") # calculate the 32-nearest neighbor preservation for each observation # 0 means no neighbors preserved, 1 means all of them pres32 <- nbr_pres(res$dx, res$dy, 32) embed_plot(res$coords, x = pres32, cex = 1.5) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.