View source: R/cluster.intern.R
cluster.intern | R Documentation |
From a list of imputed datasets clusterMI
performs cluster analysis on each imputed data set.
cluster.intern(
res.imp,
method.clustering = "kmeans",
scaling = TRUE,
nb.clust = NULL,
method.hclust = "average",
method.dist = "euclidean",
modelNames = NULL,
modelName.hc = "VVV",
nstart.kmeans = 100,
iter.max.kmeans = 10,
m.cmeans = 2,
samples.clara = 500,
verbose = FALSE
)
res.imp |
a list of imputed data sets |
method.clustering |
a single string specifying the clustering algorithm used ("kmeans", "pam", "clara", "hclust" or "mixture","cmeans") |
scaling |
boolean. If TRUE, variables are scaled. Default value is TRUE |
nb.clust |
an integer specifying the number of clusters |
method.hclust |
character string defining the clustering method for hierarchical clustering (required only if method.clustering = "hclust") |
method.dist |
character string defining the method use for computing dissimilarity matrices in hierarchical clustering (required only if method.clustering = "hclust") |
modelNames |
character string indicating the models to be fitted in the EM phase of clustering (required only if method.clustering = "mixture"). By default modelNames = NULL. |
modelName.hc |
A character string indicating the model to be used in model-based agglomerative hierarchical clustering.(required only if method.clustering = "mixture"). By default modelNames.hc = "VVV". |
nstart.kmeans |
how many random sets should be chosen for kmeans initalization. Default value is 100 (required only if method.clustering = "kmeans") |
iter.max.kmeans |
how many iterations should be chosen for kmeans. Default value is 10 (required only if method.clustering = "kmeans") |
m.cmeans |
degree of fuzzification in cmeans clustering. By default m.cmeans = 2 |
samples.clara |
number of samples to be drawn from the dataset when performing clustering using clara algorithm. Default value is 500. |
verbose |
logical |
Performs cluster analysis (according to the method.clustering
argument). For achieving this goal, the function uses as an input an output from the imputedata
function and applies the cluster analysis method on each imputed data set
Step 1 can be tuned by specifying the cluster analysis method used (method.clustering
argument).
If method.clustering = "kmeans"
or "pam"
, then the number of clusters can be specified by tuning the nb.clust
argument. By default, the same number as the one used for imputation is used.
The number of random initializations can also be tuned through the nstart.kmeans
argument.
If method.clustering = "hclust"
(hierarchical clustering), the method used can be specified (see hclust
). By default "average"
is used. Furthermore, the number of clusters can be specified, but it can also be automatically chosen if nb.clust
< 0.
If method.clustering = "mixture"
(model-based clustering using gaussian mixture models), the model to be fitted can be tuned by modifying the modelNames
argument (see Mclust
).
If method.clustering = "cmeans"
(clustering using the fuzzy c-means algorithm), then the fuzziness parameter can be modfied by tuning them.cmeans
argument. By default, m.cmeans = 2
.
Can be performed in parallel by specifying the number of CPU cores (nnodes
argument).
A list with clustering results
hclust
, Mclust
, imputedata
, cmeans
,dist
data(wine)
require(parallel)
set.seed(123456)
ref <- wine$cult
nb.clust <- 3
m <- 5 # number of imputed data sets. Should be larger in practice
wine.na <- wine
wine.na$cult <- NULL
wine.na <- prodna(wine.na)
#imputation
res.imp <- imputedata(data.na = wine.na, nb.clust = nb.clust, m = m)
#analysis by kmeans and pooling
nnodes <- 2 # parallel::detectCores()
res.pool <- clusterMI(res.imp, nnodes = nnodes)
res.pool$instability
table(ref, res.pool$part)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.