R/adjRand_test.R

Defines functions adjRand_test

Documented in adjRand_test

#' Test the significance of the adjusted Rand index
#'
#' Permutation test of the adjusted Rand index,
#' which quantifies the level of agreement
#' between two partitions (e.g., two schemes of classification
#' of the same individuals obtained with two methods)
#'
#' The adjusted Rand index (Hubert and Arabie 1985),
#' is an adjusted for chance version of the Rand index
#' (Rand 1971).
#' The adjusted Rand index has an expected value of zero
#' in the case of random partitions,
#' and values approaching one as the two partitions
#' become more similar to each other
#' (with one being perfect match of the classification).
#'  This function implements the permutation test
#'  proposed by Qannari et al. (2014)
#' to obtain a p value against the null hypothesis of
#' independence of the two partitions.
#'
#'
#' This function is useful in various contexts,
#' such as in integrative taxonomy
#' when comparing the classification of individual specimens
#' obtained using different data
#' (e.g., sequence data and morphometric data).
#' For an example of the application of this technique
#' with the classification obtained with genetic data and
#' morphometric data for multiple traits,
#' see Fruciano et al. 2016.
#'
#'
#' @section Notice:
#' The function requires internally the package mclust.
#'
#' @param A,B numerical or character vectors reflecting
#' the assignment of individual observations to groups
#' @param perm number of permutations
#' @return The function outputs a vector with the adjusted Rand index
#' and the p value obtained from the permutation test
#'
#' @section Citation:
#' If you use this function in the context of integrative
#' taxonomy or similar (comparison of classification/unsupervised
#' clustering with biological data), please cite all the papers
#' in the references (otherwise, please use the relevant citations
#' for the context).
#'
#' @references Fruciano C, Franchini P, Raffini F, Fan S, Meyer A.
#'   2016. Are sympatrically speciating Midas cichlid fish special?
#'   Patterns of morphological and genetic variation in the closely
#'   related species Archocentrus centrarchus. Ecology and Evolution
#'   6:4102-4114.
#' @references Hubert L, Arabie P. 1985. Comparing partitions.
#'   Journal of Classification 2:193-218.
#' @references Qannari EM, Courcoux P, Faye P. 2014. Significance
#'   test of the adjusted Rand index. Application to the free
#'   sorting task. Food Quality and Preference 32, Part A:93-97.
#' @references Rand WM. 1971. Objective Criteria for the Evaluation
#'   of Clustering Methods. Journal of the American Statistical
#'   Association 66:846-850.
#'
#' @examples
#' library(mclust)
#' set.seed(123)
#'
#' irisBIC = mclustBIC(iris[,-5])
#' mclustBIC_classification = summary(irisBIC,iris[,-5])$classification
#' original_classification = iris[,5]
#' # This is one of the examples in the package mclust
#' # Here a classification algorithm is used on the iris dataset
#'
#' adjustedRandIndex(mclustBIC_classification, original_classification)
#' # The mclust package allows computing the adjusted Rand index
#' # which quantifies the agreement between the original (correct)
#' # classification and the one obtained with the algorithm.
#' # However, it is not clear whether the adjusted Rand index is
#' # "large enough" compared to the null hypothesis of independence
#' # between the two classification schemes
#'
#' adjRand_test(mclustBIC_classification, original_classification,
#'              perm = 999)
#' # For that, we use the function adjRand_test, which performs
#' # the permutation test of Qannari et al. 2014 (in this case
#' # p<0.001, as 1000 permutations have been used).
#'
#' adjRand_test(original_classification, original_classification,
#'              perm = 999)
#' # As it can be seen, in the ideal case of the exact same grouping,
#' # the adjusted Rand index takes a value of 1 (which is obviously
#' # significant)
#'
#'
#' @importFrom mclust adjustedRandIndex
#'
#' @export
adjRand_test=function(A, B, perm=999) {

  if (!requireNamespace("mclust", quietly = TRUE)) {
    stop("Package 'mclust' is required but is not installed.")
  }
  # Ensure that the required package is available

  if (length(A)!=length(B)) {
    stop("A and B should have the same length")
  }
  # Make sure that the two groups of partitions have the same length

  ARIo=mclust::adjustedRandIndex(A, B)
  # Observed adjusted Rand index

  Aperm=lapply(seq_len(perm), function(X)
    sample(A, length(A), replace=FALSE))
  Bperm=lapply(seq_len(perm), function(X)
    sample(B, length(B), replace=FALSE))
  # Generate permuted samples

  ARIperm=unlist(lapply(seq_len(perm), function(i)
    mclust::adjustedRandIndex(Aperm[[i]], Bperm[[i]])))
  # Compute adjusted Rand index for the permuted samples

  m=mean(ARIperm); v=var(ARIperm)
  # compute mean and variance of the permuted samples

  NARI=(ARIperm-m)/sqrt(v)
  # compute NARI (normalized ARI)

  NARIo=(ARIo-m)/sqrt(v)
  # compute observed NARI

  p_value <- (sum(NARI > NARIo) + 1) / (perm + 1)
  # Compute p value as proportion of permuted NARI larger than
  # the observed

Results=c(ARIo, p_value)
names(Results)=c("Adjusted_Rand_index", "p_value")
return(Results)
}

Try the GeometricMorphometricsMix package in your browser

Any scripts or data that you put into this service are public.

GeometricMorphometricsMix documentation built on April 18, 2026, 1:06 a.m.