View source: R/agglomerative_clustering.R
agglomerative_clustering | R Documentation |
Perform a hierarchical agglomerative cluster analysis on a set of observations
agglomerative_clustering(
data,
proximity = "single",
details = FALSE,
waiting = TRUE,
...
)
data |
a set of observations, presented as a matrix-like object where every row is a new observation. |
proximity |
the proximity definition to be used. This should be one
of |
details |
a Boolean determining whether intermediate logs explaining how the algorithm works should be printed or not. |
waiting |
a Boolean determining whether the intermediate logs should be printed in chunks waiting for user input before printing the next or not. |
... |
additional arguments passed to |
This function performs a hierarchical cluster analysis for the
n
objects being clustered. The definition of a set of clusters using
this method follows a n
step process, which repeats until a single
cluster remains:
Initially, each object is assigned to its own cluster. The matrix of distances between clusters is computed.
The two clusters with closest proximity will be joined together and
the proximity matrix updated. This is done according to the specified
proximity
. This step is repeated until a single cluster remains.
The definitions of proximity
considered by this function are:
single
\min\left\{d(x,y):x\in A,y\in B\right\}
. Defines
the proximity between two clusters as the distance between the closest
objects among the two clusters. It produces clusters where each object is
closest to at least one other object in the same cluster. It is known as
SLINK, single-link and minimum-link.
complete
\max\left\{d(x,y):x\in A,y\in B\right\}
.
Defines the proximity between two clusters as the distance between the
furthest objects among the two clusters. It is known as CLINK,
complete-link and maximum-link.
average
\frac{1}{\left|A\right|\cdot\left|B\right|}
\sum_{x\in A}\sum_{y\in B} d(x,y)
. Defines the proximity between two
clusters as the average distance between every pair of objects, one from
each cluster. It is also known as UPGMA or average-link.
An stats::hclust()
object which describes the tree produced by the
clustering process.
Eduardo Ruiz Sabajanes, eduardo.ruizs@edu.uah.es
### !! This algorithm is very slow, so we'll only test it on some datasets !!
### Helper function
test <- function(db, k, prox) {
print(cl <- clustlearn::agglomerative_clustering(db, prox))
oldpar <- par(mfrow = c(1, 2))
plot(db, col = cutree(cl, k), asp = 1, pch = 20)
h <- rev(cl$height)[50]
clu <- as.hclust(cut(as.dendrogram(cl), h = h)$upper)
ctr <- unique(cutree(cl, k)[cl$order])
plot(clu, labels = FALSE, hang = -1, xlab = "Cluster", sub = "", main = "")
rect.hclust(clu, k = k, border = ctr)
par(oldpar)
}
### Example 1
test(clustlearn::db1, 2, "single")
### Example 2
# test(clustlearn::db2, 2, "sing") # same as "single"
### Example 3
test(clustlearn::db3, 4, "a") # same as "average"
### Example 4
test(clustlearn::db4, 6, "s") # same as "single"
### Example 5
test(clustlearn::db5, 3, "complete")
### Example 6
# test(clustlearn::db6, 3, "c") # same as "complete"
### Example 7 (with explanations, no plots)
cl <- clustlearn::agglomerative_clustering(
clustlearn::db5[1:6, ],
'single',
details = TRUE,
waiting = FALSE
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.