cluster_tsne_mclust: Apply Mclust Clustering on t-SNE Results

View source: R/functions.R

cluster_tsne_mclustR Documentation

Apply Mclust Clustering on t-SNE Results

Description

This function performs Mclust clustering on the 2D t-SNE results, which are derived from high-dimensional data. It includes an initial outlier detection step using DBSCAN, and the user can specify whether to exclude outliers from the clustering process. Silhouette scores are computed to evaluate the quality of the clustering, and cluster centroids are returned for visualization, with outliers handled separately.

Usage

cluster_tsne_mclust(info.norm, tsne.norm, settings)

Arguments

info.norm

A data frame containing the normalized data on which the t-SNE analysis was carried out.

tsne.norm

The t-SNE results object, including the 2D t-SNE coordinates (Y matrix).

settings

A list of settings for the clustering analysis, including:

  • clustGroups: The number of groups (clusters) for Mclust to fit.

  • minPtsAdjustmentFactor: A factor to adjust the minimum number of points required to form a cluster (MinPts) in DBSCAN.

  • epsQuantile: The quantile used to determine the eps value for DBSCAN.

  • excludeOutliers: A logical value indicating whether to exclude outliers detected by DBSCAN from the Mclust clustering.

  • pointSize: A numeric value used to adjust the placement of outlier centroids.

Details

The function first uses DBSCAN to detect outliers (marked as cluster "100") and then applies Mclust clustering on the t-SNE results. Outliers can be either included or excluded from the clustering, depending on the settings. Silhouette scores are calculated to assess the quality of the clustering. Cluster centroids are returned, along with the sizes of each cluster, and outliers are handled separately in the centroid calculation.

Value

A list containing:

  • info.norm: The input data frame with an additional pandora_cluster column for cluster assignments.

  • cluster_data: A data frame with cluster centroids and labeled clusters.

  • avg_silhouette_score: The average silhouette score, providing a measure of clustering quality.


immunaut documentation built on April 12, 2025, 1:22 a.m.