Description Value GENERATION METHODS Author(s) References See Also
The objects of class "valstat"
store cluster validation
statistics from various clustering methods run with various numbers of
clusters.
A legitimate valstat
object is a list. The format of the list
relies on the number of involved clustering methods, nmethods
,
say, i.e., the length
of the method
component explained below. The first
nmethods
elements of the valstat
list are just
numbered. These are themselves lists that are numbered between 1 and
the maxG
component defined below. Element [[i]][[j]]
refers to the clustering from clustering method number i with number
of clusters j. Every such element is a list
with components
avewithin, mnnd, cvnnd, maxdiameter, widestgap, sindex, minsep,
asw, dindex, denscut, highdgap, pearsongamma, withinss, entropy
:
Further optional components are pamc, kdnorm, kdunif,
dmode, aggregated
. All these are cluster validation indexes, as
follows.
avewithin 
average distance within clusters (reweighted so that every observation, rather than every distance, has the same weight). 
mnnd 
average distance to 
cvnnd 
coefficient of variation of dissimilarities to

maxdiameter 
maximum cluster diameter. 
widestgap 
widest withincluster gap or average of clusterwise
widest withincluster gap, depending on parameter 
sindex 
separation index. Defined based on the distances for
every point to the
closest point not in the same cluster. The separation index is then
the mean of the smallest proportion 
minsep 
minimum cluster separation. 
asw 
average silhouette
width. See 
dindex 
this index measures to what extent the density decreases from the cluster mode to the outskirts; Idensdec in Sec. 3.6 of Hennig (2019); low values are good. 
denscut 
this index measures whether cluster boundaries run through density valleys; Idensbound in Sec. 3.6 of Hennig (2019); low values are good. 
highdgap 
this measures whether there is a large withincluster gap with high density on both sides; Ihighdgap in Sec. 3.6 of Hennig (2019); low values are good. 
pearsongamma 
correlation between distances and a 01vector where 0 means same cluster, 1 means different clusters. "Normalized gamma" in Halkidi et al. (2001). 
withinss 
a generalisation of the within clusters sum
of squares (kmeans objective function), which is obtained if

entropy 
entropy of the distribution of cluster memberships, see Meila(2007). 
pamc 
average distance to cluster centroid, which is the observation that minimises this average distance. 
kdnorm 
Kolmogorov distance between distribution of withincluster Mahalanobis distances and appropriate chisquared distribution, aggregated over clusters (I am grateful to Agustin MayoIscar for the idea). 
kdunif 
Kolmogorov distance between distribution of distances to

dmode 
aggregated density mode index equal to

Furthermore, a valstat
object
has the following list components:
maxG 
maximum number of clusters. 
minG 
minimum number of clusters (list entries below that number are empty lists). 
method 
vector of names (character strings) of clustering
CBIfunctions, see 
name 
vector of names (character strings) of clustering
methods. These can be userchosen names (see argument

statistics 
vector of names (character strings) of cluster validation indexes. 
These objects are generated as part of the
clusterbenchstats
output.
The valstat
class has methods for the following generic functions:
print
, plot
, see plot.valstat
.
Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/
Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modelingestimating, Forecasting and Data Mining, Volume 2, Wiley, New York 124, https://arxiv.org/abs/1703.09282
Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for contextadapted comparison of clusterings. Statistics and Computing, 30, 15231544, https://link.springer.com/article/10.1007/s11222020099582, https://arxiv.org/abs/2002.01822
clusterbenchstats
,
plot.valstat
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.