evalclust | R Documentation |
The function evaluates clustering results by a set of evaluation criteria (cluster validity indices).
evalclust(data, clusters, diss = NULL)
data |
A data.frame or a matrix with cases in rows and variables in columns. |
clusters |
A data.frame or a list of cluster memberships obtained based on the dataset defined in the parameter |
diss |
An optional parameter. A matrix or a dist object containing dissimilarities calculated based on the dataset defined in the parameter |
The function calculates a set of evaluation criteria if the original dataset and the cluster membership variables are provided. The function calculates up to 13 evaluation criteria described by (Sulc et al., 2018) and (Corter and Gluck, 1992) and provides the optimal number of clusters based on these criteria. It is primarily focused on evaluating hierarchical clustering results obtained by similarity measures different from those that occur in the nomclust package. Thus, it can serve for the comparison of various similarity measures for categorical data.
The function returns a list with three components.
The eval
component contains up to 13 evaluation criteria as vectors in a list. Namely, Within-cluster mutability coefficient (WCM), Within-cluster entropy coefficient (WCE),
Pseudo F Indices based on the mutability (PSFM) and the entropy (PSFE), Bayesian (BIC), and Akaike (AIC) information criteria for categorical data, the BK index, Category Utility (CU), Category Information (CI), Hartigan Mutability (HM), Hartigan Entropy (HE) and, if the prox component is present, the silhouette index (SI) and the Dunn index (DI).
The opt
component is present in the output together with the eval
component. It displays the optimal number of clusters for the evaluation criteria from the eval
component, except for WCM and WCE, where the optimal number of clusters is based on the elbow method.
The call
component contains the function call.
Zdenek Sulc.
Contact: zdenek.sulc@vse.cz
Corter J.E., Gluck M.A. (1992). Explaining basic categories: Feature predictability and information. Psychological Bulletin 111(2), p. 291–303.
Sulc Z., Cibulkova J., Prochazka J., Rezankova H. (2018). Internal Evaluation Criteria for Categorical Data in Hierarchical Clustering: Optimal Number of Clusters Determination, Metodoloski Zveski, 15(2), p. 1-20.
nomclust
, nomprox
, eval.plot
.
# sample data
data(data20)
# creating an object with results of hierarchical clustering
hca.object <- nomclust(data20, measure = "iof", method = "average", clu.high = 7)
# the cluster memberships
data20.clu <- hca.object$mem
# obtaining evaluation criteria for the provided dataset and cluster memberships
data20.eval <- evalclust(data20, clusters = data20.clu)
# visualization of the evaluation criteria
eval.plot(data20.eval)
# silhouette index can be calculated if the dissimilarity matrix is provided
data20.eval <- evalclust(data20, clusters = data20.clu, diss = hca.object$prox)
eval.plot(data20.eval, criteria = "SI")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.