mset_kmeans: Generates Methods Settings for K-Means Clustering
In qcluster: Clustering via Quadratic Scoring

mset_kmeans

R Documentation

Generates Methods Settings for K-Means Clustering

Description

The function generates a software abstraction of a list of clustering models implemented through a set of tuned methods and algorithms. In particular, it generates a list of kmeans-type functions each combining tuning parameters and other algorithmic settings. The generated functions are ready to be called on the data set.

Usage

mset_kmeans(K = c(1:10),
            iter.max = 50,
            nstart = 30,
            algorithm = "Hartigan-Wong",
            trace = FALSE)

Arguments

`K`	a vector, specifies the number of clusters.
`iter.max`	a vector, contains the settings of the `iter.max` parameter of `kmeans`.
`nstart`	a vector, contains the settings of the `nstart` parameter of`kmeans`.
`algorithm`	a vector, contains the settings of the `algorithm` parameter of `kmeans`.
`trace`	a vector, contains the settings of the `trace` parameter of `kmeans`.

Details

The function produces functions implementing competing clustering methods based on the K-Means methodology as implemented in kmeans. This is a specialized version of the more general function mset_user. In particular, it produces a list of kmeans functions each corresponding to a specific setup in terms of hyper-parameters (e.g. the number of clusters) and algorithm's control parameters (e.g. initialization). See kmeans for more detail for a detailed description of the role of each argument and their data types.

Value

An S3 object of class 'qcmethod'. Each element of the list represents a competing method containing the following objects

`fullname`	a string identifying the setup.
`callargs`	a list with `kmeans` function arguments.
`fn`	the function implementing the specified setting. This `fn` function can be executed on the data set. It has two arguments: `data` and `only_params`. `data` is a data matrix or data.frame `only_params` is logical. If `only_params==FALSE` (default), `fn` will return the object returned by `kmeans`. If `only_params==TRUE` (default) `fn` will return only cluster parameters (proportions, mean, and cov, see clust2params.

References

Coraggio, Luca, and Pietro Coretto (2023). Selecting the Number of Clusters, Clustering Models, and Algorithms. A Unifying Approach Based on the Quadratic Discriminant Score. Journal of Multivariate Analysis, Vol. 196(105181), pp. 1-20, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2023.105181")}

Examples

# 'pam' settings combining number of clusters K={2,3}, and dissimilarities {euclidean, manhattan}
A <- mset_pam(K = c(2,3), metric = c("euclidean", "manhattan"))
   
# select setup 1: K=2, metric = "euclidean"
m <- A[[1]]
print(m)

# cluster with the method set in 'ma1'
data("banknote")
dat  <- banknote[-1]
fit1 <- m$fn(dat)   
fit1
class(fit1)

# if only cluster parameters are needed
fit2 <- m$fn(dat, only_params = TRUE)   
fit2

qcluster documentation built on April 3, 2025, 6:16 p.m.