mset_kmeans: Generates Methods Settings for K-Means Clustering

View source: R/mset_kmeans.R

mset_kmeansR Documentation

Generates Methods Settings for K-Means Clustering

Description

The function generates a software abstraction of a list of clustering models implemented through a set of tuned methods and algorithms. In particular, it generates a list of kmeans-type functions each combining tuning parameters and other algorithmic settings. The generated functions are ready to be called on the data set.

Usage

mset_kmeans(K = c(1:10),
            iter.max = 50,
            nstart = 30,
            algorithm = "Hartigan-Wong",
            trace = FALSE)

Arguments

K

a vector, specifies the number of clusters.

iter.max

a vector, contains the settings of the iter.max parameter of kmeans.

nstart

a vector, contains the settings of the nstart parameter ofkmeans.

algorithm

a vector, contains the settings of the algorithm parameter of kmeans.

trace

a vector, contains the settings of the trace parameter of kmeans.

Details

The function produces functions implementing competing clustering methods based on the K-Means methodology as implemented in kmeans. This is a specialized version of the more general function mset_user. In particular, it produces a list of kmeans functions each corresponding to a specific setup in terms of hyper-parameters (e.g. the number of clusters) and algorithm's control parameters (e.g. initialization). See kmeans for more detail for a detailed description of the role of each argument and their data types.

Value

An S3 object of class 'qcmethod'. Each element of the list represents a competing method containing the following objects

fullname

a string identifying the setup.

callargs

a list with kmeans function arguments.

fn

the function implementing the specified setting. This fn function can be executed on the data set. It has two arguments: data and only_params. data is a data matrix or data.frame only_params is logical. If only_params==FALSE (default), fn will return the object returned by kmeans. If only_params==TRUE (default) fn will return only cluster parameters (proportions, mean, and cov, see clust2params.

References

Coraggio, Luca, and Pietro Coretto (2023). Selecting the Number of Clusters, Clustering Models, and Algorithms. A Unifying Approach Based on the Quadratic Discriminant Score. Journal of Multivariate Analysis, Vol. 196(105181), pp. 1-20, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2023.105181")}

See Also

kmeans, mset_user, bqs

Examples

# 'pam' settings combining number of clusters K={2,3}, and dissimilarities {euclidean, manhattan}
A <- mset_pam(K = c(2,3), metric = c("euclidean", "manhattan"))
   
# select setup 1: K=2, metric = "euclidean"
m <- A[[1]]
print(m)

# cluster with the method set in 'ma1'
data("banknote")
dat  <- banknote[-1]
fit1 <- m$fn(dat)   
fit1
class(fit1)

# if only cluster parameters are needed
fit2 <- m$fn(dat, only_params = TRUE)   
fit2

qcluster documentation built on April 3, 2025, 6:16 p.m.