distrsimilarity: Similarity of within-cluster distributions to normal and...
In fpc: Flexible Procedures for Clustering

distrsimilarity

R Documentation

Similarity of within-cluster distributions to normal and uniform

Description

Two measures of dissimilarity between the within-cluster distributions of a dataset and normal or uniform distribution. For the normal it's the Kolmogorov dissimilarity between the Mahalanobis distances to the center and a chi-squared distribution. For the uniform it is the Kolmogorov distance between the distance to the kth nearest neighbour and a Gamma distribution (this is based on Byers and Raftery (1998)). The clusterwise values are aggregated by weighting with the cluster sizes.

Usage

distrsimilarity(x,clustering,noisecluster = FALSE,
distribution=c("normal","uniform"),nnk=2,
largeisgood=FALSE,messages=FALSE)

Arguments

`x`	the data matrix; a numerical object which can be coerced to a matrix.
`clustering`	integer vector of class numbers; length must equal `nrow(x)`, numbers must go from 1 to the number of clusters.
`noisecluster`	logical. If `TRUE`, the cluster with the largest number is ignored for the computations.
`distribution`	vector of `"normal", "uniform"` or both. Indicates which of the two dissimilarities is/are computed.
`nnk`	integer. Number of nearest neighbors to use for dissimilarity to the uniform.
`largeisgood`	logical. If `TRUE`, dissimilarities are transformed to `1-d` (this means that larger values indicate a better fit).
`messages`	logical. If `TRUE`, warnings are given if within-cluster covariance matrices are not invertible (in which case all within-cluster Mahalanobis distances are set to zero).

Value

List with the following components

`kdnorm`	Kolmogorov distance between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution, aggregated over clusters (I am grateful to Agustin Mayo-Iscar for the idea).
`kdunif`	Kolmogorov distance between distribution of distances to `nnk`th nearest within-cluster neighbor and appropriate Gamma-distribution, see Byers and Raftery (1998), aggregated over clusters.
`kdnormc`	vector of cluster-wise Kolmogorov distances between distribution of within-cluster Mahalanobis distances and appropriate chi-squared distribution.
`kdunifc`	vector of cluster-wise Kolmogorov distances between distribution of distances to `nnk`th nearest within-cluster neighbor and appropriate Gamma-distribution.
`xmahal`	vector of Mahalanobs distances to the respective cluster center.
`xdknn`	vector of distance to `nnk`th nearest within-cluster neighbor.

Note

It is very hard to capture similarity to a multivariate normal or uniform in a single value, and both used here have their shortcomings. Particularly, the dissimilarity to the uniform can still indicate a good fit if there are holes or it's a uniform distribution concentrated on several not connected sets.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Byers, S. and Raftery, A. E. (1998) Nearest-Neighbor Clutter Removal for Estimating Features in Spatial Point Processes, Journal of the American Statistical Association, 93, 577-584.

Hennig, C. (2017) Cluster validation by measurement of clustering characteristics relevant to the user. In C. H. Skiadas (ed.) Proceedings of ASMDA 2017, 501-520, https://arxiv.org/abs/1703.09282

Examples

  set.seed(20000)
  options(digits=3)
  face <- rFace(200,dMoNo=2,dNoEy=0,p=2)
  km3 <- kmeans(face,3)
  distrsimilarity(face,km3$cluster)

fpc documentation built on Sept. 24, 2024, 9:07 a.m.

fpc index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

fpc
Flexible Procedures for Clustering

distrsimilarity: Similarity of within-cluster distributions to normal and...
In fpc: Flexible Procedures for Clustering

Similarity of within-cluster distributions to normal and uniform

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Related to distrsimilarity in fpc...

R Package Documentation

Browse R Packages

We want your feedback!

fpc Flexible Procedures for Clustering

distrsimilarity: Similarity of within-cluster distributions to normal and... In fpc: Flexible Procedures for Clustering

Similarity of within-cluster distributions to normal and uniform

Description

Usage

Arguments

Value

Note

Author(s)

References

See Also

Examples

Related to distrsimilarity in fpc...

R Package Documentation

Browse R Packages

We want your feedback!

fpc
Flexible Procedures for Clustering

distrsimilarity: Similarity of within-cluster distributions to normal and...
In fpc: Flexible Procedures for Clustering