tsne2clus: t-Stochastic Neighbor Embedding to Clusters

View source: R/tsne2clus.R

tsne2clusR Documentation

t-Stochastic Neighbor Embedding to Clusters

Description

Finds clusters on a 2 dimensional map using Density-based spatial clustering of applications with noise (DBSCAN; Esther et al. 1996).

Usage

tsne2clus(
  S.tsne,
  ann = NULL,
  labels,
  aest = NULL,
  eps_res = 100,
  eps_range = c(0, 4),
  min.clus.size = 10,
  group.names = "Groups",
  xlab = "x: tSNE(X)",
  ylab = "y: tSNE(X)",
  clus = TRUE
)

Arguments

S.tsne

Outcome of function "pca2tsne"

ann

Subjects' annotation data. An incidence matrix assigning subjects to classes of biological relevance. Meant to tune cluster assignation via Biological Homogeneity Index (BHI). If ann=NULL, the number of clusters is tuned with the Silhouette index instead of BHI. Defaults to NULL.

labels

Character vector with labels describing subjects. Meant to assign aesthetics to the visual display of clusters.

aest

Data frame containing points shape and color. Defaults to NULL.

eps_res

How many eps values should be explored between the specified range?

eps_range

Vector containing the minimum and maximum eps values to be explored. Defaults to c(0, 4).

min.clus.size

Minimum size for a cluster to appear in the visual display. Defaults to 10

group.names

The title for the legend's key if 'aest' is specified.

xlab

Name of the 'xlab'. Defaults to "x: tSNE(X)"

ylab

Name of the 'ylab'. Defaults to "y: tSNE(X)"

clus

Should we do clustering? Defaults to TRUE. If false, only point aesthetics are applied.

Details

The function takes the outcome of pca2tsne (or a list containing any two-columns matrix) and finds clusters via DBSCAN. It extends code from the MEREDITH (Taskesen et al. 2016) and clValid (Datta & Datta, 2018) R packages to tune DBSCAN parameters with Silhouette or Biological Homogeneity indexes.

Value

A list with the results of the DBSCAN clustering and (if argument 'plot'=TRUE) the corresponding graphical displays.

  • dbscan.res: a list with the results of the (sparse) SVD, containing:

    • cluster: Cluster partition.

    • eps: Optimal eps according to the Silhouette or Biological Homogeneity indexes criteria.

    • SIL: Maximum peak in the trajectory of the Silhouette index.

    • BHI: Maximum peak in the trajectory of the Biological Homogeneity index.

  • clusters.plot: A ggplot object with the clusters' graphical display.

References

  • Ester, Martin, Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xiaowei Xu. 1996. "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise," 226_231.

  • Hahsler, Michael, and Matthew Piekenbrock. 2017. "Dbscan: Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms." https://cran.r-project.org/package=dbscan.

  • Datta, Susmita, and Somnath Datta. 2006. Methods for Evaluating Clustering Algorithms for Gene Expression Data Using a Reference Set of Functional Classes. BMC Bioinformatics 7 (1). BioMed Central:397.

  • Taskesen, Erdogan, Sjoerd M. H. Huisman, Ahmed Mahfouz, Jesse H. Krijthe, Jeroen de Ridder, Anja van de Stolpe, Erik van den Akker, Wim Verheagh, and Marcel J. T. Reinders. 2016. Pan-Cancer Subtyping in a 2D-Map Shows Substructures That Are Driven by Specific Combinations of Molecular Characteristics. Scientific Reports 6 (1):24949.

Examples


library(MOSS)
library(viridis)
library(cluster)
library(annotate)

# Using the 'iris' data tow show cluster definition via BHI criterion.
set.seed(42)
data(iris)
# Scaling columns.
X <- scale(iris[, -5])
# Calling pca2tsne to map the three variables onto a 2-D map.
Z <- pca2tsne(X, perp = 30, n.samples = 1, n.iter = 1000)
# Using 'species' as previous knoledge to identify clusters.
ann <- model.matrix(~ -1 + iris[, 5])
# Getting clusters.
tsne2clus(Z,
  ann = ann,
  labels = iris[, 5],
  aest = aest.f(iris[, 5]),
  group.names = "Species",
  eps_range = c(0, 3)
)

# Example of usage within moss.
set.seed(43)
sim_blocks <- simulate_data()$sim_blocks
out <- moss(sim_blocks[-4],
  tSNE = TRUE,
  cluster = list(eps_range = c(0, 4), eps_res = 100, min_clus_size = 1),
  plot = TRUE
)
out$clus_plot
out$clusters_vs_PCs


MOSS documentation built on March 26, 2022, 1:10 a.m.