nomprox: Hierarchical Clustering of Nominal Data Based on a Proximity...

nomproxR Documentation

Hierarchical Clustering of Nominal Data Based on a Proximity Matrix

Description

The function performs hierarchical cluster analysis based on a dissimilarity matrix.

Usage

nomprox(
  diss,
  data = NULL,
  method = "average",
  clu.high = 6,
  eval = TRUE,
  prox = 100
)

Arguments

diss

A proximity matrix or a dist object calculated based on the dataset defined in a parameter data.

data

A data.frame or a matrix with cases in rows and variables in columns.

method

A character string defining the clustering method. The following methods can be used: "average", "complete", "single".

clu.high

A numeric value that expresses the maximal number of clusters for which the cluster membership variables are produced.

eval

A logical operator; if TRUE, evaluation of clustering results is performed.

prox

A logical operator or a numeric value. If a logical value TRUE indicates that the proximity matrix is a part of the output. A numeric value (integer) of this argument indicates the maximal number of cases in a dataset for which a proximity matrix will occur in the output.

Details

The function performs hierarchical cluster analysis in situations when the proximity (dissimilarity) matrix was calculated externally. For instance, in a different R package, in an own-created function, or in other software. It offers three linkage methods that can be used for categorical data. The obtained clusters can be evaluated by up to 13 evaluation criteria (Sulc et al., 2018) and (Corter and Gluck, 1992).

Value

The function returns a list with up to six components:

The mem component contains cluster membership partitions for the selected numbers of clusters in the form of a list.

The eval component contains up to 13 evaluation criteria as vectors in a list. Namely, Within-cluster mutability coefficient (WCM), Within-cluster entropy coefficient (WCE), Pseudo F Indices based on the mutability (PSFM) and the entropy (PSFE), Bayesian (BIC), and Akaike (AIC) information criteria for categorical data, the BK index, Category Utility (CU), Category Information (CI), Hartigan Mutability (HM), Hartigan Entropy (HE) and, if the prox component is present, the silhouette index (SI) and the Dunn index (DI).

The opt component is present in the output together with the eval component. It displays the optimal number of clusters for the evaluation criteria from the eval component, except for WCM and WCE, where the optimal number of clusters is based on the elbow method.

The dend component can be found in the output only together with the prox component. It contains all the necessary information for dendrogram creation.

The prox component contains the dissimilarity matrix in the form of the "dist" object.

The call component contains the function call.

Author(s)

Zdenek Sulc.
Contact: zdenek.sulc@vse.cz

References

Corter J.E., Gluck M.A. (1992). Explaining basic categories: Feature predictability and information. Psychological Bulletin 111(2), p. 291–303.

Sulc Z., Cibulkova J., Prochazka J., Rezankova H. (2018). Internal Evaluation Criteria for Categorical Data in Hierarchical Clustering: Optimal Number of Clusters Determination, Metodoloski Zveski, 15(2), p. 1-20.

See Also

nomclust, evalclust, eval.plot.

Examples

# sample data
data(data20)

# computation of a dissimilarity matrix using the iof similarity measure
diss.matrix <- iof(data20)

# creating an object with results of hierarchical clustering 
hca.object <- nomprox(diss = diss.matrix, data = data20, method = "complete",
 clu.high = 5, eval = TRUE, prox = FALSE)
 
# quick clustering summary
summary(hca.object)

# quick cluster quality evaluation
print(hca.object)

# visualization of the evaluation criteria
eval.plot(hca.object)

# a dendrogram can be displayed if the object contains the prox component
hca.object <- nomprox(diss = diss.matrix, data = data20, method = "complete",
 clu.high = 5, eval = TRUE, prox = TRUE)

# a quick dendrogram
plot(hca.object)

# a dendrogram with three designated clusters
dend.plot(hca.object, clusters = 3)



nomclust documentation built on Aug. 18, 2023, 5:06 p.m.

Related to nomprox in nomclust...