cluster.magazine: Run many clustering methods on many numbers of clusters

View source: R/cquality20.R

cluster.magazineR Documentation

Run many clustering methods on many numbers of clusters

Description

Runs a user-specified set of clustering methods (CBI-functions, see kmeansCBI with several numbers of clusters on a dataset with unified output.

Usage

cluster.magazine(data,G,diss = inherits(data, "dist"),
                             scaling=TRUE, clustermethod,
                             distmethod=rep(TRUE,length(clustermethod)),
                             ncinput=rep(TRUE,length(clustermethod)),
                             clustermethodpars,
                             trace=TRUE)

Arguments

data

data matrix or dist-object.

G

vector of integers. Numbers of clusters to consider.

diss

logical. If TRUE, the data matrix is assumed to be a distance/dissimilariy matrix, otherwise it's observations times variables.

scaling

either a logical or a numeric vector of length equal to the number of columns of data. If FALSE, data won't be scaled, otherwise scaling is passed on to scale as argumentscale.

clustermethod

vector of strings specifying names of CBI-functions (see kmeansCBI). These are the clustering methods to be applied.

distmethod

vector of logicals, of the same length as clustermethod. TRUE means that the clustering method operates on distances. If diss=TRUE, all entries have to be TRUE. Otherwise, if an entry is true, the corresponding method will be applied on dist(data).

ncinput

vector of logicals, of the same length as clustermethod. TRUE indicates that the corresponding clustering method requires the number of clusters as input and will not estimate the number of clusters itself.

clustermethodpars

list of the same length as clustermethod. Specifies parameters for all involved clustering methods. Its jth entry is passed to clustermethod number k. Can be an empty entry in case all defaults are used for a clustering method. The number of clusters does not need to be specified here.

trace

logical. If TRUE, some runtime information is printed.

Value

List of lists comprising

output

Two-dimensional list. The first list index i is the number of the clustering method (ordering as specified in clustermethod), the second list index j is the number of clusters. This stores the full output of clustermethod i run on number of clusters j.

clustering

Two-dimensional list. The first list index i is the number of the clustering method (ordering as specified in clustermethod), the second list index j is the number of clusters. This stores the clustering integer vector (i.e., the partition-component of the CBI-function, see kmeansCBI) of clustermethod i run on number of clusters j.

noise

Two-dimensional list. The first list index i is the number of the clustering method (ordering as specified in clustermethod), the second list index j is the number of clusters. List entries are single logicals. If TRUE, the clustering method estimated some noise, i.e., points not belonging to any cluster, which in the clustering vector are indicated by the highest number (number of clusters plus one in case that the number of clusters was fixed).

othernc

list of integer vectors of length 2. The first number is the number of the clustering method (the order is determined by argument clustermethod), the second number is the number of clusters for those methods that estimate the number of clusters themselves and estimate a number that is smaller than min(G) or larger than max(G).

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282

See Also

clusterbenchstats, kmeansCBI

Examples

  
  set.seed(20000)
  options(digits=3)
  face <- rFace(10,dMoNo=2,dNoEy=0,p=2)
  clustermethod=c("kmeansCBI","hclustCBI","hclustCBI")
# A clustering method can be used more than once, with different
# parameters
  clustermethodpars <- list()
  clustermethodpars[[2]] <- clustermethodpars[[3]] <- list()
  clustermethodpars[[2]]$method <- "complete"
  clustermethodpars[[3]]$method <- "average"
  cmf <-  cluster.magazine(face,G=2:3,clustermethod=clustermethod,
    distmethod=rep(FALSE,3),clustermethodpars=clustermethodpars)
  print(str(cmf))


fpc documentation built on Sept. 24, 2024, 9:07 a.m.