WH_hclust: Hierarchical clustering of histogram data

View source: R/unsuperv_classification.R

WH_hclustR Documentation

Hierarchical clustering of histogram data

Description

The function implements a Hierarchical clustering for a set of histogram-valued data, based on the L2 Wassertein distance. Extends the hclust function of the stat package.

Usage

WH_hclust(
  x,
  simplify = FALSE,
  qua = 10,
  standardize = FALSE,
  distance = "WDIST",
  method = "complete"
)

Arguments

x

A MatH object (a matrix of distributionH).

simplify

A logic value (default is FALSE), if TRUE histograms are recomputed in order to speed-up the algorithm.

qua

An integer, if simplify=TRUE is the number of quantiles used for recodify the histograms.

standardize

A logic value (default is FALSE). If TRUE, histogram-valued data are standardized, variable by variable, using the Wassertein based standard deviation. Use if one wants to have variables with std equal to one.

distance

A string default "WDIST" the L2 Wasserstein distance (other distances will be implemented)

method

A string, default="complete", is the the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC).

Value

An object of class hclust which describes the tree produced by the clustering process.

References

Irpino A., Verde R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: Batanjeli et al. Data Science and Classification, IFCS 2006. p. 185-192, BERLIN:Springer, ISBN: 3-540-34415-2

See Also

hclust of stat package for further details.

Examples

results <- WH_hclust(x = BLOOD, simplify = TRUE, method = "complete")
plot(results) # it plots the dendrogram
cutree(results, k = 5) # it returns the labels for 5 clusters

HistDAWass documentation built on Sept. 26, 2022, 5:06 p.m.