pick_best_cluster_simon: Select the Best Clustering Based on Weighted Scores: AUROC,...

View source: R/functions.R

pick_best_cluster_simonR Documentation

Select the Best Clustering Based on Weighted Scores: AUROC, Modularity, and Silhouette

Description

This function selects the optimal clustering configuration from a list of t-SNE clustering results by evaluating each configuration's AUROC, modularity, and silhouette scores. These scores are combined using a weighted average, allowing for a more comprehensive assessment of each configuration's relevance.

Usage

pick_best_cluster_simon(dataset, tsne_clust, tsne_calc, settings)

Arguments

dataset

A data frame representing the original dataset, where each observation will be assigned cluster labels from each clustering configuration in tsne_clust.

tsne_clust

A list of clustering results from different t-SNE configurations, with each element containing pandora_cluster assignments and clustering information.

tsne_calc

An object containing t-SNE results on dataset.

settings

A list of settings for machine learning model training and scoring, including:

excludedColumns

A character vector of columns to exclude from the analysis.

preProcessDataset

A character vector of preprocessing steps (e.g., scaling, centering).

selectedPartitionSplit

Numeric; the partition split ratio for train/test splits.

selectedPackages

Character vector of machine learning models to train.

trainingTimeout

Numeric; time limit (in seconds) for training each model.

weights

A list of weights for scoring criteria: weights$AUROC, weights$modularity, and weights$silhouette (default is 0.4, 0.3, and 0.3 respectively).

Details

For each clustering configuration in tsne_clust, this function:

  1. Assigns cluster labels to the dataset.

  2. Trains machine learning models specified in settings on the dataset with cluster labels.

  3. Evaluates each model based on AUROC, modularity, and silhouette scores.

  4. Selects the clustering configuration with the highest weighted average score as the best clustering result.

Value

A list containing the best clustering configuration (with the highest weighted score) and its associated information.


immunaut documentation built on April 12, 2025, 1:22 a.m.