cluster_tsne_hierarchical: Perform Hierarchical Clustering on t-SNE Results

View source: R/functions.R

cluster_tsne_hierarchicalR Documentation

Perform Hierarchical Clustering on t-SNE Results

Description

This function applies hierarchical clustering to t-SNE results, allowing for the identification of clusters in a reduced-dimensional space. The function also handles outliers by using DBSCAN for initial noise detection, and provides options to include or exclude outliers from the clustering process. Silhouette scores are computed to evaluate clustering quality, and cluster centroids are returned for visualization.

Usage

cluster_tsne_hierarchical(info.norm, tsne.norm, settings)

Arguments

info.norm

A data frame containing the normalized data on which the t-SNE analysis was carried out.

tsne.norm

The t-SNE results object, including the 2D t-SNE coordinates (Y matrix).

settings

A list of settings for the clustering analysis. The settings must include:

  • clustLinkage: The linkage method for hierarchical clustering (e.g., "ward.D2").

  • clustGroups: The number of groups (clusters) to cut the hierarchical tree into.

  • distMethod: The distance metric to be used (e.g., "euclidean").

  • minPtsAdjustmentFactor: A factor to adjust the minimum number of points required to form a cluster (MinPts).

  • epsQuantile: The quantile used to determine the eps value for DBSCAN.

  • excludeOutliers: A logical value indicating whether to exclude outliers detected by DBSCAN from hierarchical clustering.

  • pointSize: A numeric value used to adjust the placement of outlier centroids.

Details

The function first uses DBSCAN to detect outliers (marked as cluster "100") and then applies hierarchical clustering on the t-SNE results, either including or excluding the outliers depending on the settings. Silhouette scores are computed to assess the quality of the clustering. Cluster centroids are calculated and returned, along with the sizes of each cluster. Outliers, if detected, are handled separately in the final centroid calculation.

Value

A list containing:

  • info.norm: The input data frame with an additional pandora_cluster column for cluster assignments.

  • cluster_data: A data frame with cluster centroids and labeled clusters.

  • avg_silhouette_score: The average silhouette score, providing a measure of clustering quality.


immunaut documentation built on April 12, 2025, 1:22 a.m.