cgrestandard: Standardise cluster validation statistics by random...
In fpc: Flexible Procedures for Clustering

cgrestandard

R Documentation

Standardise cluster validation statistics by random clustering results

Description

Standardises cluster validity statistics as produced by clustatsum relative to results that were achieved by random clusterings on the same data by randomclustersim. The aim is to make differences between values comparable between indexes, see Hennig (2019), Akhanli and Hennig (2020).

This is mainly for use within clusterbenchstats.

Usage

cgrestandard(clusum,clusim,G,percentage=FALSE,
                               useallmethods=FALSE,
                             useallg=FALSE, othernc=list())

Arguments

`clusum`	object of class "valstat", see `clusterbenchstats`.
`clusim`	list; output object of `randomclustersim`, see there.
`G`	vector of integers. Numbers of clusters to consider.
`percentage`	logical. If `FALSE`, standardisation is done to mean zero and standard deviation 1 using the random clusterings. If `TRUE`, the output is the percentage of simulated values below the result (more precisely, this number plus one divided by the total plus one).
`useallmethods`	logical. If `FALSE`, only random clustering results from `clusim` are used for standardisation. If `TRUE`, also clustering results from other methods as given in `clusum` are used.
`useallg`	logical. If `TRUE`, standardisation uses results from all numbers of clusters in `G`. If `FALSE`, standardisation of results for a specific number of cluster only uses results from that number of clusters.
`othernc`	list of integer vectors of length 2. This allows the incorporation of methods that bring forth other numbers of clusters than those in `G`, for example because a method may have automatically estimated a number of clusters. The first number is the number of the clustering method (the order is determined by argument `clustermethod` in `clusterbenchstats`), the second number is the number of clusters. Results specified here are only standardised in `useallg=TRUE`.

Details

cgrestandard will add a statistic named dmode to the input set of validation statistics, which is defined as 0.75*dindex+0.25*highdgap, aggregating these two closely related statistics, see clustatsum.

Value

List of class "valstat", see valstat.object, with standardised results as explained above.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Hennig, C. (2019) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Data Analysis and Applications 1: Clustering and Regression, Modeling-estimating, Forecasting and Data Mining, Volume 2, Wiley, New York 1-24, https://arxiv.org/abs/1703.09282

Akhanli, S. and Hennig, C. (2020) Calibrating and aggregating cluster validity indexes for context-adapted comparison of clusterings. Statistics and Computing, 30, 1523-1544, https://link.springer.com/article/10.1007/s11222-020-09958-2, https://arxiv.org/abs/2002.01822

Examples

  
  set.seed(20000)
  options(digits=3)
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  dif <- dist(face)
  clusum <- list()
  clusum[[2]] <- list()
  cl12 <- kmeansCBI(face,2)
  cl13 <- kmeansCBI(face,3)
  cl22 <- claraCBI(face,2)
  cl23 <- claraCBI(face,2)
  ccl12 <- clustatsum(dif,cl12$partition)
  ccl13 <- clustatsum(dif,cl13$partition)
  ccl22 <- clustatsum(dif,cl22$partition)
  ccl23 <- clustatsum(dif,cl23$partition)
  clusum[[1]] <- list()
  clusum[[1]][[2]] <- ccl12
  clusum[[1]][[3]] <- ccl13
  clusum[[2]][[2]] <- ccl22
  clusum[[2]][[3]] <- ccl23
  clusum$maxG <- 3
  clusum$minG <- 2
  clusum$method <- c("kmeansCBI","claraCBI")
  clusum$name <- c("kmeansCBI","claraCBI")
  clusim <- randomclustersim(dist(face),G=2:3,nnruns=1,kmruns=1,
    fnruns=1,avenruns=1,monitor=FALSE)
  cgr <- cgrestandard(clusum,clusim,2:3)
  cgr2 <- cgrestandard(clusum,clusim,2:3,useallg=TRUE)
  cgr3 <- cgrestandard(clusum,clusim,2:3,percentage=TRUE)
  print(str(cgr))
  print(str(cgr2))
  print(cgr3[[1]][[2]])

fpc documentation built on Sept. 24, 2024, 9:07 a.m.