subset_clusters_data: Subset Cluster Data
In jlmelville/snedata: SNE Simulation Dataset Functions

subset_clusters_data

R Documentation

Subset Cluster Data

Description

One tiny gaussian cluster inside of a big cluster from "How to Use t-SNE Effectively".

Usage

subset_clusters_data(n, dim = 2, big_sdev = 50)

Arguments

`n`	Number of points per gaussian.
`dim`	Dimension of the gaussians.
`big_sdev`	Standard deviation of the bigger cluster, default 50. The smaller cluster has a standard deviation of 1.

Details

Creates a dataset consisting of two gaussians with the same center, but with the first cluster having a standard deviation of 1, and the second having a standard deviation of big_sdev (default 50). Points are colored depending on which cluster they belong to (small cluster is dark powder blue, large is light orange).

Value

Data frame with coordinates in the X1, X2 ... Xdim columns, and color in the color column.

References

http://distill.pub/2016/misread-tsne/

Examples

df <- subset_clusters_data(n = 50, dim = 2)

# 10D example where the big cluster is only twice the standard deviation of
# the small cluster
df <- subset_clusters_data(n = 50, dim = 10, big_sdev = 2)

jlmelville/snedata documentation built on March 5, 2025, 12:22 p.m.