greencut: Cut a Greenclust Tree into Optimal Groups

View source: R/greencut.R

greencutR Documentation

Cut a Greenclust Tree into Optimal Groups

Description

Cuts a greenclust tree at an automatically-determined number of groups.

Usage

greencut(g, k = NULL, h = NULL)

Arguments

g

a tree as producted by greenclust

k

an integer scalar with the desired number of groups

h

numeric scalar with the desired height where the tree should be cut

Details

The cut point is calculated by finding the number of groups/clusters that results in a collapsed contingency table with the most-significant (lowest p-value) chi-squared test. If there are ties, the smallest number of groups wins.

If a certain number of groups is required or a specific r-squared (1 - height) threshold is targeted, values for either k or h may be provided. (While the regular cutree function could also be used in this circumstance, it may still be useful to have the additional attributes that greencut() provides.)

As with cutree(), k overrides h if both are given.

Value

greencut returns a vector of group memberships, with the resulting r-squared value and p-value as object attributes, accessable via attr.

References

Greenacre, M.J. (1988) "Clustering the Rows and Columns of a Contingency Table," Journal of Classification 5, 39-51. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1007/BF01901670")}

See Also

greenclust, greenplot, assign.cluster

Examples

# Combine Titanic passenger attributes into a single category
# and create a contingency table for the non-zero levels
tab <- t(as.data.frame(apply(Titanic, 4:1, FUN=sum)))
tab <- tab[apply(tab, 1, sum) > 0, ]

grc <- greenclust(tab)
greencut(grc)

plot(grc)
rect.hclust(grc, max(greencut(grc)),
            border=unique(greencut(grc))+1)

greenclust documentation built on Sept. 20, 2023, 1:07 a.m.