subsample_clustering_evaluation: Clustering analysis on cross-validated data sets

View source: R/clustering.R

subsample_clustering_evaluationR Documentation

Clustering analysis on cross-validated data sets

Description

Performs clustering analysis on each fold of an external cross validation.

Usage

subsample_clustering_evaluation(
  dat_embedded,
  parallel = 1,
  by = c("datname", "drname", "run", "fold"),
  silhouette_dissimilarity = NULL,
  dat_list = NULL,
  ...
)

Arguments

dat_embedded

list of data.frames

parallel

number of threads

by

variables to split input data by

silhouette_dissimilarity

dissimilarity matrix used for silhouette evaluation

dat_list

list of input data matrices used for calculating clustering indices

...

extra arguments are passed through to clustering_dissimilarity_from_data, clustering_analysis and clustering_metrics

Details

Produces clusterings using multiple methods and settings while computing internal validation metrics such as Connectivity, Dunn and Silhouette scores. Also computes chi-squared tests with respect to a batch label if one is provided.

Value

Returns a list of data.frames containing clustering_analysis and clustering_metrics outputs for every combination of CV run, CV fold, clustering method, number of clusters as well as all combinations of data sets and dimensionality reduction techniques found in the input data.frame.


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.