| mset_user | R Documentation |
The function generates a software abstraction of a list of clustering models implemented through the a set of tuned methods and algorithms. The base clustering methodology is provided via a user-defined function. The latter prototype is exapanded in a list of fucntions each combining tuning parameters and other algorithmic settings. The generated functions are ready to be called on the data set.
mset_user(fname, .packages = NULL, .export = NULL, ...)
fname |
a function implementing a user-defined clustering method. It clusters
a data set and outputs cluster parameters. |
.packages |
character vector of packages that the tasks in |
.export |
character vector of variables to export that are needed by
|
... |
parameters passed to |
The function produces functions implementing competing clustering methods
based on a prototype methodology implemented by the user via
the input argument fname.
In particular, it builds a list of fname-type functions each
corresponding to a specific setup in terms of
hyper-parameters (e.g. the number of clusters) and algorithm's
control parameters (e.g. initialization).
Requirements for fname.
fname is a function implementing the base clustering method of
interest. It must have the following input argument
data:
a numeric vector, matrix, or data frame of observations. Rows
correspond to observations and columns correspond to
variables/features.
Categorical variables and NA values are not allowed.
Additionally, fname can have any other input parameter controlling
the underlying clustering model/method/algorithm. All this additional
parameters are passed to mset_user via ...
(see Arguments).
The output of fname must contain a list named params
with cluster parameters describing size, centrality and scatter.
Let P=number of variable/features and K=number of clusters.
The elements of params are as follows:
prop: a vector of clusters' proportions;
mean: a matrix of dimension (P x K) containing the clusters' mean
parameters;
cov: an array of size (P x P x K) containing the clusters'
covariance matrices.
Note that params can be easily obtained from a vector of cluster labels
using clust2params.
packages and export. The user does not
normally need to specify packages and export.
These arguments are not needed if the functions generated by mset_user
will be called from an environment containing all variables and
functions needed to execute fname.
Functions like bqs will call the functions
by mset_user within a parallel infrastructure
using foreach. If the user specifies
packages and export, they will be passed to the
.packages and .export arguments of
foreach.
Finally, note that the package already contains specialized versions of mset_user
generating methods settings for some popular algorithms
(see mset_gmix, mset_kmeans, mset_pam)
An S3 object of class 'qcmethod'. Each element of the list
represents a competing method containing the following objects
fullname |
a string identifying the setup. |
callargs |
a list with arguments that are passed to the base function. |
fn |
the function implementing the specified setting. This |
Coraggio, Luca, and Pietro Coretto (2023). Selecting the Number of Clusters, Clustering Models, and Algorithms. A Unifying Approach Based on the Quadratic Discriminant Score. Journal of Multivariate Analysis, Vol. 196(105181), pp. 1-20, \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2023.105181")}
clust2params, mset_gmix, mset_kmeans, mset_pam
# load data
data("banknote")
dat <- banknote[-1]
# EXAMPLE 1: generate Hierarchical Clustering settings
# ----------------------------------------------------
# wrapper for the popular stats::hclust() for Hierarchical Clustering
# Note the usee:
# of the optional arguments '...' passed to the underling clustering function
# the use of 'clust2params' to add cluster parameters to the output
hc_wrapper <- function(data, K, ...){
dm <- dist(data, method = "euclidean")
## ... = hc parameters
hc <- hclust(dm, ...)
cl <- cutree(hc, k = K)
## output with params
res <- list()
res$cluster <- cl
res$params <- clust2params(data, cluster = cl)
return(res)
}
# generate settings for Hierarchical Clustering with varying
# number of clusters K={3,4}, agglomeration method = {ward.D, median}
# see help('stats::hclust')
A <- mset_user(fname="hc_wrapper", K = c(2,3), method = c("ward.D", "complete"))
# get the setting with K=2 and method = "complete"
ma <- A[[4]]
ma
# cluster data with M[[3]]
fit_a1 <- ma$fn(dat)
fit_a1
## if only cluster parameters are needed
fit_a2 <- ma$fn(dat, only_params = TRUE)
fit_a2
## Not run:
# EXAMPLE 2: generate 'mclust' model settings
# -------------------------------------------
# mclust is popular package for performing model based clustering based on
# Gaussian mixture. Please visit
# https://cran.r-project.org/web/packages/mclust/vignettes/mclust.html
require(mclust)
# wrapper for the popular stats::hclust() for Hierarchical Clustering
# Notes:
# * optional arguments '...' are passed to the underling
# 'mclust' clustering function
# * 'mclust' fits Gaussian Mixture models so cluster parameters are
# contained in the mclust object
mc_wrapper <- function(data, K, ...){
y <- Mclust(data, G = K, ...)
y[["params"]] <- list(proportion = y$parameters$pro,
mean = y$parameters$mean,
cov = y$parameters$variance$sigma)
return(y)
}
# generate 'mclust' model settings by varying the number of clusters and
# covariance matrix models (see help('mclust::mclustModelNames'))
B <- mset_user(fname = "mc_wrapper", K = c(2,3), modelNames = c("EEI", "VVV"))
# get the setting with K=3 and covariance model "EEI"
mb <- B[[2]]
mb
# cluster data with M[[3]]
fit_b <- mb$fn(dat)
fit_b ## class(fit_b) = "Mclust"
# if needed one can make sure that 'mclust' package is always available
# by setting the argument 'packages'
B <- mset_user(fname = "mc_wrapper", K = c(2,3), modelNames = c("EEI","VVV"),
packages=c("mclust"))
## End(Not run)
## Not run:
# EXAMPLE 3: generate 'dbscan' settings
# -------------------------------------
# DBSCAN is popular nonparametric method for discovering clusters of
# arbitrary shapes with noise. The number of clusters is implicitly
# determined via two crucial tunings usually called 'eps' and 'minPts'
# See https://en.wikipedia.org/wiki/DBSCAN
require(dbscan)
# wrapper for dbscan::dbscan
db_wrap <- function(data, ...) {
cl <- dbscan(data, borderPoints = TRUE, ...)$cluster
return(params = clust2params(data, cl))
}
D <- mset_user(fname = "db_wrap", eps = c(0.5, 1), minPts=c(5,10))
md <- D[[2]]
fit_d <- md$fn(dat)
fit_d
class(fit_d)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.