knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README_figs/"
)

protoclust

This R package implements hierarchical clustering with minimax linkage (a.k.a., prototype clustering). See Bien, J. and Tibshirani, R. (2011) Hierarchical Clustering with Prototypes via Minimax Linkage for more details.

Since protoclust is on CRAN, it can be easily installed using

install.packages("protoclust")

in R. Occasionally, this github version of the package will be more up-to-date. The easiest way to install this version is by using the devtools R package (if not already installed, open R and type install.packages("devtools")). To install protoclust, type

devtools::install_github("jacobbien/protoclust")

in R.

Simple example of prototype clustering

Suppose we have some data.

set.seed(1)
n <- 100
p <- 2
x <- matrix(rnorm(n * p), n, p)
rownames(x) <- paste("A", 1:n, sep = "")
d <- dist(x)
par(pty = "s")
plot(x, type = "n", xlab = "", ylab = "")
points(x, pch = 20, col = "lightblue")

After computing an n-by-n matrix of dissimilarities d, we can apply prototype clustering.

# perform minimax linkage clustering:
library(protoclust)
hc <- protoclust(d)

# cut the tree at height 1:
h <- 1
cut <- protocut(hc, h = h)
# plot dendrogram (and show cut):
plot(hc, imerge = cut$imerge, col=2)
abline(h = h, lty = 2)

This gives us r length(cut$protos) prototypes with the guarantee that each of the n points is no more than r h unit from one of the prototypes.

# get the prototype assigned to each point:
pr <- cut$protos[cut$cl]
# find point farthest from its prototype:
dmat <- as.matrix(d)
ifar <- which.max(dmat[cbind(1:n, pr[1:n])])

# since this is a 2d example, make 2d display:
par(pty = "s")
plot(x, type="n", xlab = "", ylab = "")
points(x, pch=20, col="lightblue")
lines(rbind(x[ifar, ], x[pr[ifar], ]), col=3)
points(x[cut$protos, ], pch=20, col="red")
text(x[cut$protos, ], labels=hc$labels[cut$protos], pch=19)
tt <- seq(0, 2 * pi, length=100)
for (i in cut$protos) {
 lines(x[i, 1] + h * cos(tt), x[i, 2] + h * sin(tt))
}


jacobbien/protoclust documentation built on April 8, 2022, 8:09 p.m.