cluster_tsne_knn_louvain: Perform KNN and Louvain Clustering on t-SNE Results

View source: R/functions.R

cluster_tsne_knn_louvainR Documentation

Perform KNN and Louvain Clustering on t-SNE Results

Description

This function performs clustering on t-SNE results by first applying K-Nearest Neighbors (KNN) to construct a graph, and then using the Louvain method for community detection. The function dynamically adjusts KNN parameters based on the size of the dataset, ensuring scalability. Additionally, it computes the silhouette score to evaluate cluster quality and calculates cluster centroids for visualization.

Usage

cluster_tsne_knn_louvain(
  info.norm,
  tsne.norm,
  settings,
  resolution_increment = 0.1,
  min_modularity = 0.5
)

Arguments

info.norm

A data frame containing the normalized data on which the t-SNE analysis was carried out.

tsne.norm

A list containing the t-SNE results, including a 2D t-SNE coordinate matrix in the Y element.

settings

A list of settings for the analysis, including:

  • knn_clusters: The number of nearest neighbors to use for KNN (default: 250).

  • target_clusters_range: A numeric vector specifying the target range for the number of clusters.

  • start_resolution: The starting resolution for Louvain clustering.

  • end_resolution: The maximum resolution to test.

  • min_modularity: The minimum acceptable modularity for valid clusterings.

resolution_increment

The step size for incrementing the Louvain clustering resolution. Defaults to 0.1.

min_modularity

The minimum modularity score allowed for a valid clustering. Defaults to 0.5.

Details

This function begins by constructing a KNN graph from the t-SNE results, then applies the Louvain algorithm for community detection. The KNN parameter is dynamically adjusted based on the size of the dataset to ensure scalability. The function evaluates clustering quality using silhouette scores and calculates cluster centroids for visualization. NA cluster assignments are handled by assigning them to a separate cluster labeled as "100."

Value

A list containing the following elements:

  • info.norm: The input data frame with an additional pandora_cluster column for cluster assignments.

  • cluster_data: A data frame containing cluster centroids and cluster labels.

  • avg_silhouette_score: The average silhouette score, a measure of clustering quality.

  • modularity: The modularity score of the Louvain clustering.

  • num_clusters: The number of clusters found.


immunaut documentation built on April 12, 2025, 1:22 a.m.