Compute validity measures for partitions and hierarchies, attempting to measure how well these clusterings capture the underlying structure in the data they were obtained from.

1 2 3 | ```
cl_validity(x, ...)
## Default S3 method:
cl_validity(x, d, ...)
``` |

`x` |
an object representing a partition or hierarchy. |

`d` |
a dissimilarity object from which |

`...` |
arguments to be passed to or from methods. |

`cl_validity`

is a generic function.

For partitions, its default method gives the “dissimilarity
accounted for”, defined as *1 - a_w / a_t*, where *a_t* is
the average total dissimilarity, and the “average within
dissimilarity” *a_w* is given by

*
∑_{i,j} ∑_k m_{ik}m_{jk} d_{ij} /
∑_{i,j} ∑_k m_{ik}m_{jk}*

where *d* and *m* are the dissimilarities and memberships,
respectively, and the sums are over all pairs of objects and all
classes.

For hierarchies, the validity measures computed by default are
“variance accounted for” (VAF, e.g., Hubert, Arabie & Meulman,
2006) and “deviance accounted for” (DEV, e.g., Smith, 2001).
If `u`

is the ultrametric corresponding to the hierarchy `x`

and `d`

the dissimilarity `x`

was obtained from, these
validity measures are given by

*
max(0, 1 - sum_{i,j} (d_{ij} - u_{ij})^2 /
sum_{i,j} (d_{ij} - mean(d))^2)*

and

*
max(0, 1 - sum_{i,j} |d_{ij} - u_{ij}| /
sum_{i,j} |d_{ij} - median(d)|)*

respectively. Note that VAF and DEV are not invariant under rescaling
`u`

, and may be “arbitrarily small” (i.e., 0 using the
above definitions) even though `u`

and `d`

are
“structurally close” in some sense.

For the results of using `agnes`

and
`diana`

, the agglomerative and divisive
coefficients are provided in addition to the default ones.

A list of class `"cl_validity"`

with the computed validity
measures.

L. Hubert, P. Arabie and J. Meulman (2006).
*The structural representation of proximity matrices with
MATLAB*.
Philadelphia, PA: SIAM.

T. J. Smith (2001).
Constructing ultrametric and additive trees based on the *L_1*
norm.
*Journal of Classification*, **18**/2, 185–207.
doi: 10.1007/s00357-001-0015-0.

`cluster.stats`

in package fpc for a variety of
cluster validation statistics;
`fclustIndex`

in package e1071 for several
fuzzy cluster indexes;
`clustIndex`

in package cclust;
`silhouette`

in package cluster.

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.