greenclust tree at an automatically-determined number
a tree as producted by
an integer scalar with the desired number of groups
numeric scalar with the desired height where the tree should be cut
The cut point is calculated by finding the number of groups/clusters that results in a collapsed contingency table with the most-significant (lowest p-value) chi-squared test. If there are ties, the smallest number of groups wins.
If a certain number of groups is required or a specific r-squared
(1 - height) threshold is targeted, values for either
may be provided. (While the regular
cutree function could
also be used in this circumstance, it may still be useful to have the
additional attributes that
h if both are given.
greencut returns a vector of group memberships, with the
resulting r-squared value and p-value as object attributes,
Greenacre, M.J. (1988) "Clustering the Rows and Columns of a Contingency Table," Journal of Classification 5, 39-51. https://doi.org/10.1007/BF01901670
1 2 3 4 5 6 7 8 9 10 11
# Combine Titanic passenger attributes into a single category # and create a contingency table for the non-zero levels tab <- t(as.data.frame(apply(Titanic, 4:1, FUN=sum))) tab <- tab[apply(tab, 1, sum) > 0, ] grc <- greenclust(tab) greencut(grc) plot(grc) rect.hclust(grc, max(greencut(grc)), border=unique(greencut(grc))+1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.