View source: R/optimal_bioregion.R
optimal_phyloregion | R Documentation |
This function divides the hierarchical dendrogram into meaningful clusters ("phyloregions"), based on the ‘elbow’ or ‘knee’ of an evaluation graph that corresponds to the point of optimal curvature.
optimal_phyloregion(x, method = "average", k = 20)
x |
a numeric matrix, data frame or “dist” object. |
method |
the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of “ward.D”, “ward.D2”, “single”, “complete”, “average” (= UPGMA), “mcquitty” (= WPGMA), “median” (= WPGMC) or “centroid” (= UPGMC). |
k |
numeric, the upper bound of the number of clusters to compute. DEFAULT: 20 or the number of observations (if less than 20). |
a list containing the following as returned from the GMD package (Zhao et al. 2011):
k
: optimal number of clusters (bioregions)
totbss
: total between-cluster sum-of-square
tss
: total sum of squares of the data
ev
: explained variance given k
Salvador, S. & Chan, P. (2004) Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. Proceedings of the Sixteenth IEEE International Conference on Tools with Artificial Intelligence, pp. 576–584. Institute of Electrical and Electronics Engineers, Piscataway, New Jersey, USA.
Zhao, X., Valen, E., Parker, B.J. & Sandelin, A. (2011) Systematic clustering of transcription start site landscapes. PLoS ONE 6: e23409.
data(africa)
tree <- africa$phylo
bc <- beta_diss(africa$comm)
(d <- optimal_phyloregion(bc[[1]], k=15))
plot(d$df$k, d$df$ev, ylab = "Explained variances",
xlab = "Number of clusters")
lines(d$df$k[order(d$df$k)], d$df$ev[order(d$df$k)], pch = 1)
points(d$optimal$k, d$optimal$ev, pch = 21, bg = "red", cex = 3)
points(d$optimal$k, d$optimal$ev, pch = 21, bg = "red", type = "h")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.