hcut: Computes Hierarchical Clustering and Cut the Tree

View source: R/hcut.R

hcutR Documentation

Computes Hierarchical Clustering and Cut the Tree

Description

Computes hierarchical clustering (hclust, agnes, diana) and cuts the tree into k clusters. It also accepts correlation-based distance measures such as "pearson", "spearman" and "kendall". Direct calls require k >= 2; helper-level one-cluster handling is implemented in callers such as eclust() and fviz_nbclust().

Usage

hcut(
  x,
  k = 2,
  isdiss = inherits(x, "dist"),
  hc_func = c("hclust", "agnes", "diana"),
  hc_method = "ward.D2",
  hc_metric = "euclidean",
  stand = FALSE,
  graph = FALSE,
  ...
)

Arguments

x

a numeric matrix, numeric data frame or a dissimilarity matrix.

k

a single integer specifying the number of clusters to be generated. Must be at least 2 and smaller than the number of observations.

isdiss

logical value specifying whether x is already a dissimilarity matrix. If TRUE, x must inherit from class "dist" and contain only finite values.

hc_func

the hierarchical clustering function to be used. Default value is "hclust". Possible values is one of "hclust", "agnes", "diana". Abbreviation is allowed.

hc_method

the agglomeration method to be used (?hclust) for hclust() and agnes(): "ward.D", "ward.D2", "single", "complete", "average", ...

hc_metric

character string specifying the metric to be used for calculating dissimilarities between observations. Allowed values are those accepted by the function dist() [including "euclidean", "manhattan", "maximum", "canberra", "binary", "minkowski"] and correlation based distance measures ["pearson", "spearman" or "kendall"].

stand

logical value; default is FALSE. If TRUE, then the data will be standardized using the function scale(). Measurements are standardized for each variable (column), by subtracting the variable's mean value and dividing by the variable's standard deviation. If scaling produces NA values, hcut() stops with a package-level error.

graph

logical value. If TRUE, the dendrogram is displayed.

...

not used.

Value

an object of class "hcut" containing the result of the standard function used (read the documentation of hclust, agnes, diana).

It includes also:

  • cluster: the cluster assignment of observations after cutting the tree

  • nbclust: the number of clusters

  • silinfo: the silhouette information of observations (available when k > 1)

  • size: the size of clusters

  • data: a matrix containing the original or the standardized data (if stand = TRUE)

See Also

fviz_dend, hkmeans, eclust

Examples


data(USArrests)

# Compute hierarchical clustering and cut into 4 clusters
res <- hcut(USArrests, k = 4, stand = TRUE)

# Cluster assignments of observations
res$cluster
# Size of clusters
res$size

# Visualize the dendrogram
fviz_dend(res, rect = TRUE)

# Visualize the silhouette
fviz_silhouette(res)

# Visualize clusters as scatter plots
fviz_cluster(res)



factoextra documentation built on June 26, 2026, 5:10 p.m.