cluster.intern: Apply clustering method after multiple imputation

View source: R/cluster.intern.R

cluster.internR Documentation

Apply clustering method after multiple imputation

Description

From a list of imputed datasets clusterMI performs cluster analysis on each imputed data set.

Usage

cluster.intern(
  res.imp,
  method.clustering = "kmeans",
  scaling = TRUE,
  nb.clust = NULL,
  method.hclust = "average",
  method.dist = "euclidean",
  modelNames = NULL,
  modelName.hc = "VVV",
  nstart.kmeans = 100,
  iter.max.kmeans = 10,
  m.cmeans = 2,
  samples.clara = 500,
  verbose = FALSE
)

Arguments

res.imp

a list of imputed data sets

method.clustering

a single string specifying the clustering algorithm used ("kmeans", "pam", "clara", "hclust" or "mixture","cmeans")

scaling

boolean. If TRUE, variables are scaled. Default value is TRUE

nb.clust

an integer specifying the number of clusters

method.hclust

character string defining the clustering method for hierarchical clustering (required only if method.clustering = "hclust")

method.dist

character string defining the method use for computing dissimilarity matrices in hierarchical clustering (required only if method.clustering = "hclust")

modelNames

character string indicating the models to be fitted in the EM phase of clustering (required only if method.clustering = "mixture"). By default modelNames = NULL.

modelName.hc

A character string indicating the model to be used in model-based agglomerative hierarchical clustering.(required only if method.clustering = "mixture"). By default modelNames.hc = "VVV".

nstart.kmeans

how many random sets should be chosen for kmeans initalization. Default value is 100 (required only if method.clustering = "kmeans")

iter.max.kmeans

how many iterations should be chosen for kmeans. Default value is 10 (required only if method.clustering = "kmeans")

m.cmeans

degree of fuzzification in cmeans clustering. By default m.cmeans = 2

samples.clara

number of samples to be drawn from the dataset when performing clustering using clara algorithm. Default value is 500.

verbose

logical

Details

Performs cluster analysis (according to the method.clustering argument). For achieving this goal, the function uses as an input an output from the imputedata function and applies the cluster analysis method on each imputed data set

Step 1 can be tuned by specifying the cluster analysis method used (method.clustering argument). If method.clustering = "kmeans" or "pam", then the number of clusters can be specified by tuning the nb.clust argument. By default, the same number as the one used for imputation is used. The number of random initializations can also be tuned through the nstart.kmeans argument. If method.clustering = "hclust" (hierarchical clustering), the method used can be specified (see hclust). By default "average" is used. Furthermore, the number of clusters can be specified, but it can also be automatically chosen if nb.clust < 0. If method.clustering = "mixture" (model-based clustering using gaussian mixture models), the model to be fitted can be tuned by modifying the modelNames argument (see Mclust). If method.clustering = "cmeans" (clustering using the fuzzy c-means algorithm), then the fuzziness parameter can be modfied by tuning them.cmeans argument. By default, m.cmeans = 2.

Can be performed in parallel by specifying the number of CPU cores (nnodes argument).

Value

A list with clustering results

See Also

hclust, Mclust, imputedata, cmeans,dist

Examples

data(wine)

require(parallel)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 5 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)

#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)

#analysis by kmeans and pooling
nnodes <- 2 # parallel::detectCores()
res.pool <- clusterMI(res.imp, nnodes = nnodes)

res.pool$instability
table(ref, res.pool$part)


clusterMI documentation built on Oct. 23, 2024, 5:07 p.m.