subset_clusters_data: Subset Cluster Data

View source: R/misread-tsne.R

subset_clusters_dataR Documentation

Subset Cluster Data

Description

One tiny gaussian cluster inside of a big cluster from "How to Use t-SNE Effectively".

Usage

subset_clusters_data(n, dim = 2, big_sdev = 50)

Arguments

n

Number of points per gaussian.

dim

Dimension of the gaussians.

big_sdev

Standard deviation of the bigger cluster, default 50. The smaller cluster has a standard deviation of 1.

Details

Creates a dataset consisting of two gaussians with the same center, but with the first cluster having a standard deviation of 1, and the second having a standard deviation of big_sdev (default 50). Points are colored depending on which cluster they belong to (small cluster is dark powder blue, large is light orange).

Value

Data frame with coordinates in the X1, X2 ... Xdim columns, and color in the color column.

References

http://distill.pub/2016/misread-tsne/

See Also

Other distill functions: circle_data(), cube_data(), gaussian_data(), grid_data(), link_data(), long_cluster_data(), long_gaussian_data(), ortho_curve(), random_circle_cluster_data(), random_circle_data(), random_jump(), random_walk(), simplex_data(), three_clusters_data(), trefoil_data(), two_clusters_data(), two_different_clusters_data(), unlink_data()

Examples

df <- subset_clusters_data(n = 50, dim = 2)

# 10D example where the big cluster is only twice the standard deviation of
# the small cluster
df <- subset_clusters_data(n = 50, dim = 10, big_sdev = 2)

jlmelville/snedata documentation built on Jan. 13, 2024, 2:06 a.m.