CVL: Cross validation Loss for hierarchical clustering methods

View source: R/CVL.R

CVLR Documentation

Cross validation Loss for hierarchical clustering methods

Description

Leave one feature out Cross validation loss for comparing a list of hierarchical clustering methods.

Usage

CVL(
  data,
  cl.list,
  dists = "euclidean",
  mixed_dist = "gower",
  seed = 99,
  ncores = 2,
  median = FALSE
)

Arguments

data

Data frame used to build hierarchical clustering

cl.list

Named list of hierarchical clustering methods: each entry is composed by the name of the hierarchical clustering algorithm (e.g. hclust) followed by a character vector of all it's underlying agglomeration methods selected for comparison (e.g. "complete", "ward.D2").

dists

Chosen distance (or list of distances) to compute dissimilarity matrix for each hierarchical clustering method. Default is "euclidean".

mixed_dist

Chosen mixed distance (or list of mixed distances) to compute dissimilarity matrix for each hierarchical clustering method. Mixed distances are preferable to use when there are categorical features present in the dataset. Default is "gower".

seed

Fixed random seed for SIMMAP algorithm. Default is 99.

ncores

Number of cores to parallelize the computing process. Default is 2.

median

Set whether to compute the median of the leave one feature out loss associated to every feature (TRUE) or the mean (FALSE). Default is FALSE.

Value

A data frame comparing the CVL of each hierarchical clustering method and distance combinations.


Monoxido45/PhyloHclust documentation built on Sept. 25, 2024, 3:17 a.m.