pamk: Partitioning around medoids with estimation of number of...
In fpc: Flexible Procedures for Clustering

pamk	R Documentation

Partitioning around medoids with estimation of number of clusters

Description

This calls the function pam or clara to perform a partitioning around medoids clustering with the number of clusters estimated by optimum average silhouette width (see pam.object) or Calinski-Harabasz index (calinhara). The Duda-Hart test (dudahart2) is applied to decide whether there should be more than one cluster (unless 1 is excluded as number of clusters or data are dissimilarities).

Usage

pamk(data,krange=2:10,criterion="asw", usepam=TRUE,
     scaling=FALSE, alpha=0.001, diss=inherits(data, "dist"),
     critout=FALSE, ns=10, seed=NULL, ...)

Arguments

`data`	a data matrix or data frame or something that can be coerced into a matrix, or dissimilarity matrix or object. See `pam` for more information.
`krange`	integer vector. Numbers of clusters which are to be compared by the average silhouette width criterion. Note: average silhouette width and Calinski-Harabasz can't estimate number of clusters `nc=1`. If 1 is included, a Duda-Hart test is applied and 1 is estimated if this is not significant.
`criterion`	one of `"asw"`, `"multiasw"` or `"ch"`. Determines whether average silhouette width (as given out by `pam`/`clara`, or as computed by `distcritmulti` if `"multiasw"` is specified; recommended for large data sets with `usepam=FALSE`) or Calinski-Harabasz is applied. Note that the original Calinski-Harabasz index is not defined for dissimilarities; if dissimilarity data is run with `criterion="ch"`, the dissimilarity-based generalisation in Hennig and Liao (2013) is used.
`usepam`	logical. If `TRUE`, `pam` is used, otherwise `clara` (recommended for large datasets with 2,000 or more observations; dissimilarity matrices can not be used with `clara`).
`scaling`	either a logical value or a numeric vector of length equal to the number of variables. If `scaling` is a numeric vector with length equal to the number of variables, then each variable is divided by the corresponding value from `scaling`. If `scaling` is `TRUE` then scaling is done by dividing the (centered) variables by their root-mean-square, and if `scaling` is `FALSE`, no scaling is done.
`alpha`	numeric between 0 and 1, tuning constant for `dudahart2` (only used for 1-cluster test).
`diss`	logical flag: if `TRUE` (default for `dist` or `dissimilarity`-objects), then `data` will be considered as a dissimilarity matrix (and the potential number of clusters 1 will be ignored). If `FALSE`, then `data` will be considered as a matrix of observations by variables.
`critout`	logical. If `TRUE`, the criterion value is printed out for every number of clusters.
`ns`	passed on to `distcritmulti` if `criterion="multiasw"`.
`seed`	passed on to `distcritmulti` if `criterion="multiasw"`.
`...`	further arguments to be transferred to `pam` or `clara`.

Value

A list with components

`pamobject`	The output of the optimal run of the `pam`-function.
`nc`	the optimal number of clusters.
`crit`	vector of criterion values for numbers of clusters. `crit[1]` is the p-value of the Duda-Hart test if 1 is in `krange` and `diss=FALSE`.

Note

clara and pam can handle NA-entries (see their documentation) but dudahart2 cannot. Therefore NA should not occur if 1 is in krange.

Author(s)

Christian Hennig christian.hennig@unibo.it https://www.unibo.it/sitoweb/christian.hennig/en/

References

Calinski, R. B., and Harabasz, J. (1974) A Dendrite Method for Cluster Analysis, Communications in Statistics, 3, 1-27.

Duda, R. O. and Hart, P. E. (1973) Pattern Classification and Scene Analysis. Wiley, New York.

Hennig, C. and Liao, T. (2013) How to find an appropriate clustering for mixed-type variables with application to socio-economic stratification, Journal of the Royal Statistical Society, Series C Applied Statistics, 62, 309-369.

Kaufman, L. and Rousseeuw, P.J. (1990). "Finding Groups in Data: An Introduction to Cluster Analysis". Wiley, New York.

Examples

  options(digits=3)
  set.seed(20000)
  face <- rFace(50,dMoNo=2,dNoEy=0,p=2)
  pk1 <- pamk(face,krange=1:5,criterion="asw",critout=TRUE)
  pk2 <- pamk(face,krange=1:5,criterion="multiasw",ns=2,critout=TRUE)
# "multiasw" is better for larger data sets, use larger ns then.
  pk3 <- pamk(face,krange=1:5,criterion="ch",critout=TRUE)

fpc documentation built on Jan. 14, 2026, 9:07 a.m.