Home

/

GitHub

/

In jchiquet/aricode: Efficient Computations of Standard Clustering Comparison Measures

knitr::opts_chunk$set(
  collapse = TRUE,
  fig.path = "man/figures/"
)

aricode

A package for efficient computations of standard clustering comparison measures

Installation

Stable version on the CRAN.

install.packages("aricode")

The development version is available via:

devtools::install_github("jchiquet/aricode")

Description

Computation of measures for clustering comparison (ARI, AMI, NID and even the $\chi^2$ distance) are usually based on the contingency table. Traditional implementations (e.g., function adjustedRandIndex of package mclust) are in $\Omega(n + u v)$ where

$n$ is the size of the vectors the classifications of which are to be compared,
$u$ and $v$ are the respective number of classes in each vectors.

In aricode we propose an implementation, based on radix sort, that is in $\Theta(n)$ in time and space.
Importantly, the complexity does not depends on $u$ and $v$. Our implementation of the ARI for instance is one or two order of magnitude faster than some standard implementation in R.

Available measures and functions

The functions included in aricode are:

ARI: computes the adjusted rand index
Chi2: computes the Chi-square statistics
MARI/MARIraw: computes the modified adjusted rand index (Sundqvist et al, in preparation)
NVI: computes the the normalized variation information
NID: computes the normalized information distance
NMI: computes the normalized mutual information
AMI: computes the adjusted mutual information
expected_MI: computes the expected mutual information
entropy: computes the conditional and joint entropies
clustComp: computes all clustering comparison measures at once

Timings

Here are some timings to compare the cost of computing the adjusted Rand Index with aricode or with the commonly used function adjustedRandIndex of the mclust package: the cost of the latter can be prohibitive for large vectors:

library(aricode)
library(mclust)
library(ggplot2)

time.aricode <- function(times, c1, c2){
  replicate(times, system.time(ARI(c1, c2))[3])
}

time.mclust <- function(times, c1, c2){
  replicate(times, system.time(mclust::adjustedRandIndex(c1, c2))[3])
}

time.method <- function(times, c1, c2, n){
  rbind(
    data.frame(time = time.aricode(times, c1, c2), expr = "aricode", n = n),
    data.frame(time = time.mclust(times, c1, c2), expr = "mclust", n = n)
  )
}

# with similar classif, number of classes grows with n
sim.timings <- function(n, times = 10) {
    c1 <- sample(1:(n/200), n, replace=TRUE);c2 <- c1;
    i_change <- sample(1:n, n/50, replace=FALSE)
    c2[i_change] <- c2[rev(i_change)]
    out <- time.method(times, c1, c2, n)
    data.frame(time=out$time, method=out$expr, n = n)
}

# with similar classif, number of classes grows with n
ns <- sort(c(200 * 2^(3:14), 150 * 2^(3:15)))
timings <- do.call("rbind", lapply(ns, sim.timings))

p.timings <- ggplot(timings, aes(x=n, y=time, colour=method)) +
  geom_smooth(data = dplyr::filter(timings, n > 1e4), method = "lm") + geom_point(size=0.25, alpha=0.9) + labs(y="time (sec.)") +
    scale_x_log10(
   breaks = scales::trans_breaks("log10", function(x) 10^x),
   labels = scales::trans_format("log10", scales::math_format(10^.x))
 ) +
 scale_y_log10(breaks = scales::trans_breaks("log10", function(x) 10^x),
   labels = scales::trans_format("log10", scales::math_format(10^.x))) +
   annotation_logticks()                 

p.timings + ggtitle("number of classes grows with n") + theme_bw()

jchiquet/aricode documentation built on Feb. 27, 2024, 4:33 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jchiquet/aricode
Efficient Computations of Standard Clustering Comparison Measures

In jchiquet/aricode: Efficient Computations of Standard Clustering Comparison Measures

aricode

Installation

Description

Available measures and functions

Timings

R Package Documentation

Browse R Packages

We want your feedback!

jchiquet/aricode Efficient Computations of Standard Clustering Comparison Measures

In jchiquet/aricode: Efficient Computations of Standard Clustering Comparison Measures

aricode

Installation

Description

Available measures and functions

Timings

R Package Documentation

Browse R Packages

We want your feedback!

jchiquet/aricode
Efficient Computations of Standard Clustering Comparison Measures