mset_gmix: Generates Methods Settings for Gaussian Mixture Model-Based...
In qcluster: Clustering via Quadratic Scoring

mset_gmix

R Documentation

Generates Methods Settings for Gaussian Mixture Model-Based Clustering

Description

The function generates a software abstraction of a list of clustering models implemented through a set of tuned methods and algorithms. In particular, it generates a list ofgmix -type functions each combining model tuning parameters and other algorithmic settings. The generated functions are ready to be called on the data set.

Usage

mset_gmix(
   K = seq(10),
   init = "kmed",
   erc = c(1, 50, 1000),
   iter.max = 1000,
   tol = 1e-8,
   init.nstart = 25, 
   init.iter.max = 30,
   init.tol = tol)

Arguments

`K`	a vector/list, specifies the number of clusters.
`init`	a vector, contains the settings of the `init` parameter of `gmix`.
`erc`	a vector/list, contains the settings of the `erc` parameter of `gmix`.
`iter.max`	a integer vector, contains the settings of the `iter.max` parameter of `gmix`.
`tol`	a vector/list, contains the settings of the `tol` parameter of `gmix`.
`init.nstart`	a integer vector, contains the settings of the `init.start` parameter of `gmix`.
`init.iter.max`	a integer vector, contains the settings of the `init.iter.max` parameter of `gmix`.
`init.tol`	a vector/list, contains the settings of the `init.tol` parameter of `gmix`.

Details

The function produces functions implementing competing clustering methods based on several Gaussian Mixture models specifications. The function produces functions for fitting competing Gaussian Mixture model-based clustering methods settings. This is a specialized version of the more general function mset_user. In particular, it produces a list of gmix functions each corresponding to a specific setup in terms of both model hyper-parameters (e.g. the number of clusters, the eigenvalue ratio constraint, etc.) and algorithm's control parameters (e.g. the type of initialization, maximum number of iteration, etc.). See gmix for a detailed description of the role of each argument and their data types.

Value

An S3 object of class 'qcmethod'. Each element of the list represents a competing method containing the following objects

`fullname`	a string identifying the setup.
`callargs`	a list with `gmix` function arguments.
`fn`	the function implementing the specified setting. This `fn` function can be executed on the data set. It has two arguments: `data` and `only_params`. `data` is a data matrix or data.frame `only_params` is logical. If `only_params==FALSE` (default), `fn` will return the object returned by `gmix`. If `only_params==TRUE` (default) `fn` will return only cluster parameters (proportions, mean, and cov, see clust2params.

References

Coraggio, Luca, and Pietro Coretto (2023). Selecting the Number of Clusters, Clustering Models, and Algorithms. A Unifying Approach Based on the Quadratic Discriminant Score. Journal of Multivariate Analysis, Vol. 196(105181), pp. 1-20, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2023.105181")}

Examples

# 'gmix' settings combining number of clusters K={3,4} and eigenvalue 
# ratio constraints {1,10} 
A <- mset_gmix(K = c(2,3), erc = c(1,10))
   
# select setup 1: K=2, erc = 1, init =" kmed"
ma1 <- A[[1]]
print(ma1)

# fit M[[1]] on banknote data
data("banknote")
dat  <- banknote[-1]
fit1 <- ma1$fn(dat)   
fit1

# if only cluster parameters are needed
fit1b <- ma1$fn(dat, only_params = TRUE)   
fit1b

   
# include a custom initialization, see also help('gmix')
compute_init <- function(data, K){
  cl  <- kmeans(data, K, nstart=1, iter.max=10)$cluster
  W   <- sapply(seq(K), function(x) as.numeric(cl==x))
  return(W)
}

# generate methods settings 
B <- mset_gmix(K = c(2,3), erc = c(1,10), init=c(compute_init, "kmed"))


# select setup 2: K=2, erc=10, init = compute_init
mb2  <- B[[2]]
fit2 <- mb2$fn(dat)   
fit2

qcluster documentation built on April 3, 2025, 6:16 p.m.