protocut: Cut a Minimax Linkage Tree To Get a Clustering

View source: R/protocut.R

protocutR Documentation

Cut a Minimax Linkage Tree To Get a Clustering

Description

Cuts a minimax linkage tree to get one of n - 1 clusterings. Works like cutree except also returns the prototypes of the resulting clustering.

Usage

protocut(hc, k = NULL, h = NULL)

Arguments

hc

an object returned by protoclust

k

the number of clusters desired

h

the height at which to cut the tree

Details

Given a minimax linkage hierarchical clustering, this function cuts the tree at a given height or so that a specified number of clusters is created. It returns both the indices of the prototypes and their locations. This latter information is useful for plotting a dendrogram with prototypes (see plotwithprototypes). As with cutree, if both k and h are given, h is ignored. Unlike cutree, in current version k and h cannot be vectors.

Value

A list corresponding to the clustering from cutting tree:

cl

vector of cluster memberships

protos

vector of prototype indices corresponding to the k clusters created. protos[i] gives the index of the prototype for all elements with cl==i

imerge

vector describing the nodes where prototypes occur. We use the naming convention of the merge matrix in hclust: if imerge[i] is positive, it is the interior node (counting from the bottom) of the cluster with elements which(cl==i); if imerge[i] is negative, then this is a singleton cluster with a leaf as prototype.

Author(s)

Jacob Bien and Rob Tibshirani

References

Bien, J., and Tibshirani, R. (2011), "Hierarchical Clustering with Prototypes via Minimax Linkage," The Journal of the American Statistical Association, 106(495), 1075-1084.

See Also

protoclust, cutree, plotwithprototypes

Examples


# generate some data:
set.seed(1)
n <- 100
p <- 2
x <- matrix(rnorm(n * p), n, p)
rownames(x) <- paste("A", 1:n, sep="")
d <- dist(x)

# perform minimax linkage clustering:
hc <- protoclust(d)

# cut the tree to yield a 10-cluster clustering:
k <- 10 # number of clusters
cut <- protocut(hc, k=k)
h <- hc$height[n - k]

# plot dendrogram (and show cut):
plotwithprototypes(hc, imerge=cut$imerge, col=2)
abline(h=h, lty=2)

# get the prototype assigned to each point:
pr <- cut$protos[cut$cl]

# find point farthest from its prototype:
dmat <- as.matrix(d)
ifar <- which.max(dmat[cbind(1:n, pr[1:n])])

# note that this distance is exactly h:
stopifnot(dmat[ifar, pr[ifar]] == h)

# since this is a 2d example, make 2d display:
plot(x, type="n")
points(x, pch=20, col="lightblue")
lines(rbind(x[ifar, ], x[pr[ifar], ]), col=3)
points(x[cut$protos, ], pch=20, col="red")
text(x[cut$protos, ], labels=hc$labels[cut$protos], pch=19)
tt <- seq(0, 2 * pi, length=100)
for (i in cut$protos) {
  lines(x[i, 1] + h * cos(tt), x[i, 2] + h * sin(tt))
}


protoclust documentation built on April 1, 2022, 9:06 a.m.