stability_kproto: Determination the stability of k Prototypes Clustering

View source: R/stability_kproto.R

stability_kprotoR Documentation

Determination the stability of k Prototypes Clustering

Description

Calculating the stability for a k-Prototypes clustering with k clusters or computing the stability-based optimal number of clusters for k-Prototype clustering. Possible stability indices are: Jaccard, Rand, Fowlkes \& Mallows and Luxburg.

Usage

stability_kproto(
  object,
  method = c("rand", "jaccard", "luxburg", "fowlkesmallows"),
  B = 100,
  verbose = FALSE,
  ...
)

Arguments

object

Object of class kproto resulting from a call with kproto(..., keep.data=TRUE)

method

character specifying the stability, either one or more of luxburg, fowlkesmallows, rand or/and jaccard.

B

numeric, number of bootstrap samples

verbose

Logical whether information about the bootstrap procedure should be given.

...

Further arguments passed to kproto, like:

  • nstart: If > 1 repetitive computations of kproto with random initial prototypes are computed.

  • lambda: Factor to trade off between Euclidean distance of numeric variables and simple matching coefficient between categorical variables.

Value

The output contains the stability for a given k-Prototype clustering in a list with two elements:

kp_stab

stability values for the given clustering

kp_bts_stab

stability values for each bootstrap samples

Author(s)

Rabea Aschenbruck

References

  • Aschenbruck, R., Szepannek, G., Wilhelm, A.F.X (2023): Stability of mixed-type cluster partitions for determination of the number of clusters. Submitted.

  • von Luxburg, U. (2010): Clustering stability: an overview. Foundations and Trends in Machine Learning, Vol 2, Issue 3. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1561/2200000008")}.

  • Ben-Hur, A., Elisseeff, A., Guyon, I. (2002): A stability based method for discovering structure in clustered data. Pacific Symposium on Biocomputing. \Sexpr[results=rd]{tools:::Rd_expr_doi("10/bhfxmf")}.

Examples

## Not run: 
# generate toy data with factors and numerics
n   <- 10
prb <- 0.99
muk <- 2.5 

x1 <- sample(c("A","B"), 2*n, replace = TRUE, prob = c(prb, 1-prb))
x1 <- c(x1, sample(c("A","B"), 2*n, replace = TRUE, prob = c(1-prb, prb)))
x1 <- as.factor(x1)
x2 <- sample(c("A","B"), 2*n, replace = TRUE, prob = c(prb, 1-prb))
x2 <- c(x2, sample(c("A","B"), 2*n, replace = TRUE, prob = c(1-prb, prb)))
x2 <- as.factor(x2)
x3 <- c(rnorm(n, mean = -muk), rnorm(n, mean = muk), rnorm(n, mean = -muk), rnorm(n, mean = muk))
x4 <- c(rnorm(n, mean = -muk), rnorm(n, mean = muk), rnorm(n, mean = -muk), rnorm(n, mean = muk))
x <- data.frame(x1,x2,x3,x4)

#' # apply k-prototypes
kpres <- kproto(x, 4, keep.data = TRUE)

# calculate cluster stability
stab <- stability_kproto(method = c("luxburg","fowlkesmallows"), object = kpres)


## End(Not run)


clustMixType documentation built on July 1, 2024, 5:08 p.m.