greenclust: Row Clustering Using Greenacre's Method

View source: R/greenclust.R

greenclustR Documentation

Row Clustering Using Greenacre's Method

Description

Iteratively collapses the rows of a table (typically a contingency table) by selecting the pair of rows each time whose combination creates the smalled loss of chi-squared.

Usage

greenclust(x, correct = FALSE, verbose = FALSE)

Arguments

x

a numeric matrix or data frame

correct

a logical indicating whether to apply a continuity correction if and when the clustered table reaches a 2x2 dimension.

verbose

if TRUE, prints the clustered table along with r-squared and p-value at each step

Value

An object of class greenclust which is compatible with most hclust object functions, such as plot() and rect.hclust(). The height vector represents the proportion of chi-squared, relative to the original table, seen at each clustering step. The greenclust object also includes a vector for the chi-squared test p-value at each step and a boolean vector indicating whether the step had a tie for "winner".

References

Greenacre, M.J. (1988) "Clustering the Rows and Columns of a Contingency Table," Journal of Classification 5, 39-51. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF01901670")}

See Also

greencut, greenplot, assign.cluster

Examples

# Combine Titanic passenger attributes into a single category
tab <- t(as.data.frame(apply(Titanic, 4:1, FUN=sum)))
# Remove rows with all zeros
tab <- tab[apply(tab, 1, sum) > 0, ]

# Perform clustering on contingency table
grc <- greenclust(tab)

# Plot r-squared and p-values for each potential cut point
greenplot(grc)

# Get clusters at suggested cut point
clusters <- greencut(grc)

# Plot dendrogram with clusters marked
plot(grc)
rect.hclust(grc, max(clusters))


JeffJetton/greenclust documentation built on Sept. 21, 2023, 12:14 p.m.