aricode-package: aricode: Efficient Computations of Standard Clustering...

aricode-packageR Documentation

aricode: Efficient Computations of Standard Clustering Comparison Measures

Description

Implements an efficient O(n) algorithm based on bucket-sorting for fast computation of standard clustering comparison measures. Available measures include adjusted Rand index (ARI), normalized information distance (NID), normalized mutual information (NMI), adjusted mutual information (AMI), normalized variation information (NVI) and entropy, as described in Vinh et al (2009) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1145/1553374.1553511")}. Include AMI (Adjusted Mutual Information) since version 0.1.2, a modified version of ARI (MARI), as described in Sundqvist et al. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/s00180-022-01230-7")} and simple Chi-square distance since version 1.0.0.

A package for efficient computations of standard clustering comparison measures. Most of the available measures are described in the paper of Vinh et al, JMLR, 2009 (see reference below).

Details

Traditional implementations (e.g., function adjustedRandIndex of package mclust) are in Omega(n + u v) where n is the size of the vectors the classifications of which are to be compared, u and v are the respective number of classes in each vectors. Here, the implementation is in Theta(n), plus the gain of speed due to the C++ code.

The functions included in aricode are:

* ARI: computes the adjusted rand index * Chi2: computes the Chi-square statistic * MARI: computes the modified adjusted rand index (Sundqvist et al, in preparation) * MARIraw: computes the raw version of the modified adjusted rand index * RI: computes the rand index * NVI: computes the normalized variation information * NID: computes the normalized information distance * NMI: computes the normalized mutual information * AMI: computes the adjusted mutual information * entropy: computes the conditional and joint entropies * clustComp: computes all clustering comparison measures at once

Author(s)

Maintainer: Julien Chiquet julien.chiquet@inrae.fr (ORCID)

Authors:

Other contributors:

Julien Chiquet julien.chiquet@inrae.fr

Guillem Rigaill guillem.rigaill@inrae.fr

Martina Sundqvist martina.sundqvist@agroparistech.fr

References

* Nguyen Xuan Vinh, Julien Epps, and James Bailey. "Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance." Journal of Machine Learning Research 11.Oct (2010): 2837-2854. as described in Vinh et al (2009) * Sundqvist, Martina, Julien Chiquet, and Guillem Rigaill. "Adjusting the adjusted Rand Index: A multinomial story." Computational Statistics 38.1 (2023): 327-347.

See Also

Useful links:

ARI, RI, NID, NVI, AMI, NMI, entropy, clustComp


aricode documentation built on Oct. 20, 2023, 5:07 p.m.