cluster.intern: Apply clustering method after multiple imputation
In clusterMI: Cluster Analysis with Missing Values by Multiple Imputation

cluster.intern

R Documentation

Apply clustering method after multiple imputation

Description

From a list of imputed datasets clusterMI performs cluster analysis on each imputed data set.

Usage

cluster.intern(
  res.imp,
  method.clustering = "kmeans",
  scaling = TRUE,
  nb.clust = NULL,
  method.hclust = "average",
  method.dist = "euclidean",
  modelNames = NULL,
  modelName.hc = "VVV",
  nstart.kmeans = 100,
  iter.max.kmeans = 10,
  m.cmeans = 2,
  samples.clara = 500,
  verbose = FALSE
)

Arguments

`res.imp`	a list of imputed data sets
`method.clustering`	a single string specifying the clustering algorithm used ("kmeans", "pam", "clara", "hclust" or "mixture","cmeans")
`scaling`	boolean. If TRUE, variables are scaled. Default value is TRUE
`nb.clust`	an integer specifying the number of clusters
`method.hclust`	character string defining the clustering method for hierarchical clustering (required only if method.clustering = "hclust")
`method.dist`	character string defining the method use for computing dissimilarity matrices in hierarchical clustering (required only if method.clustering = "hclust")
`modelNames`	character string indicating the models to be fitted in the EM phase of clustering (required only if method.clustering = "mixture"). By default modelNames = NULL.
`modelName.hc`	A character string indicating the model to be used in model-based agglomerative hierarchical clustering.(required only if method.clustering = "mixture"). By default modelNames.hc = "VVV".
`nstart.kmeans`	how many random sets should be chosen for kmeans initalization. Default value is 100 (required only if method.clustering = "kmeans")
`iter.max.kmeans`	how many iterations should be chosen for kmeans. Default value is 10 (required only if method.clustering = "kmeans")
`m.cmeans`	degree of fuzzification in cmeans clustering. By default m.cmeans = 2
`samples.clara`	number of samples to be drawn from the dataset when performing clustering using clara algorithm. Default value is 500.
`verbose`	logical

Details

Performs cluster analysis (according to the method.clustering argument). For achieving this goal, the function uses as an input an output from the imputedata function and applies the cluster analysis method on each imputed data set

Step 1 can be tuned by specifying the cluster analysis method used (method.clustering argument). If method.clustering = "kmeans" or "pam", then the number of clusters can be specified by tuning the nb.clust argument. By default, the same number as the one used for imputation is used. The number of random initializations can also be tuned through the nstart.kmeans argument. If method.clustering = "hclust" (hierarchical clustering), the method used can be specified (see hclust). By default "average" is used. Furthermore, the number of clusters can be specified, but it can also be automatically chosen if nb.clust < 0. If method.clustering = "mixture" (model-based clustering using gaussian mixture models), the model to be fitted can be tuned by modifying the modelNames argument (see Mclust). If method.clustering = "cmeans" (clustering using the fuzzy c-means algorithm), then the fuzziness parameter can be modfied by tuning them.cmeans argument. By default, m.cmeans = 2.

Can be performed in parallel by specifying the number of CPU cores (nnodes argument).

Value

A list with clustering results

Examples

data(wine, package = "clusterMI")

require(parallel)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 5 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#analysis by kmeans and pooling
nnodes <- 2 # parallel::detectCores()
res.pool <- clusterMI(res.imp, nnodes = nnodes)

res.pool$instability
table(ref, res.pool$part)

clusterMI documentation built on April 4, 2025, 12:55 a.m.