# elbow: The "Elbow" Method for Clustering Evaluation In GMD: Generalized Minimum Distance of distributions

## Description

Determining the number of clusters in a data set by the "elbow" rule.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```## find a good k given thresholds of EV and its increment. elbow(x,inc.thres,ev.thres,precision=3,print.warning=TRUE) ## a wrapper of `elbow' testing multiple thresholds. elbow.batch(x,inc.thres=c(0.01,0.05,0.1), ev.thres=c(0.95,0.9,0.8,0.75,0.67,0.5,0.33),precision=3) ## S3 method for class 'elbow' plot(x,elbow.obj=NULL,main,xlab="k", ylab="Explained_Variance",type="b",pch=20,col.abline="red", lty.abline=3,if.plot.new=TRUE,print.info=TRUE, mar=c(4,5,3,3),omi=c(0.75,0,0,0),...) ```

## Arguments

 `x` a ‘css.multi’ object, generated by `css.hclust` `inc.thres` numeric with value(s) from 0 to 1, the threshold of the increment of EV. A single value is used in `elbow` while a vector of values in `elbow.batch`. `ev.thres` numeric with value(s) from 0 to 1, the threshold of EV. A single value is used in `elbow` while a vector of values in `elbow.batch`. `precision` integer, the number of digits to round for numerical comparison. `print.warning` logical, whether to print warning messages. `elbow.obj` a ‘elbow’ object, generated by `elbow` or `elbow.batch` `main` an overall title for the plot. `ylab` a title for the y axis. `xlab` a title for the x axis. `type` what type of plot should be drawn. See `help("plot", package="graphics")`. `pch` Either an integer specifying a symbol or a single character to be used as the default in plotting points (see `par`). `col.abline` color for straight lines through the current plot (see option `col` in `par`). `lty.abline` line type for straight lines through the current plot (see option `lty` in `par`). `if.plot.new` logical, whether to start a new plot device or not. `print.info` logical, whether to print the information of ‘elbow.obj’. `mar` A numerical vector of the form 'c(bottom, left, top, right)' which gives the number of lines of margin to be specified on the four sides of the plot (see option `mar` in `par`). The default is 'c(4, 5, 3, 3) + 0.1'. `omi` A vector of the form 'c(bottom, left, top, right)' giving the size of the outer margins in inches (see option `omi` in `par`). `...` arguments to be passed to method `plot.elbow`, such as graphical parameters (see `par`).

## Details

Determining the number of clusters in a data set by the "elbow" rule and thresholds in the explained variance (EV) and its increment.

## Value

Both `elbow` and `elbow.btach` return a ‘elbow’ object (if a "good" `k` exists), which is a list containing the following components

 k number of clusters ev explained variance given `k` inc.thres the threshold of the increment in EV ev.thres the threshold of the EV

, and with an attribute ‘meta’ that contains

 description A description about the "good" `k`

`css` and `css.hclust` for computing Clustering Sum-of-Squares.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42``` ```## load library require("GMD") ## simulate data around 12 points in Euclidean space pointv <- data.frame(x=c(1,2,2,4,4,5,5,6,7,8,9,9), y=c(1,2,8,2,4,4,5,9,9,8,1,9)) set.seed(2012) mydata <- c() for (i in 1:nrow(pointv)){ mydata <- rbind(mydata,cbind(rnorm(10,pointv[i,1],0.1), rnorm(10,pointv[i,2],0.1))) } mydata <- data.frame(mydata); colnames(mydata) <- c("x","y") plot(mydata,type="p",pch=21, main="Simulated data") ## determine a "good" k using elbow dist.obj <- dist(mydata[,1:2]) hclust.obj <- hclust(dist.obj) css.obj <- css.hclust(dist.obj,hclust.obj) elbow.obj <- elbow.batch(css.obj) print(elbow.obj) ## make partition given the "good" k k <- elbow.obj\$k; cutree.obj <- cutree(hclust.obj,k=k) mydata\$cluster <- cutree.obj ## draw a elbow plot and label the data dev.new(width=12, height=6) par(mfcol=c(1,2),mar=c(4,5,3,3),omi=c(0.75,0,0,0)) plot(mydata\$x,mydata\$y,pch=as.character(mydata\$cluster), col=mydata\$cluster,cex=0.75,main="Clusters of simulated data") plot(css.obj,elbow.obj,if.plot.new=FALSE) ## clustering with more relaxed thresholds (, resulting a smaller "good" k) elbow.obj2 <- elbow.batch(css.obj,ev.thres=0.90,inc.thres=0.05) mydata\$cluster2 <- cutree(hclust.obj,k=elbow.obj2\$k) dev.new(width=12, height=6) par(mfcol=c(1,2), mar=c(4,5,3,3),omi=c(0.75,0,0,0)) plot(mydata\$x,mydata\$y,pch=as.character(mydata\$cluster2), col=mydata\$cluster2,cex=0.75,main="Clusters of simulated data") plot(css.obj,elbow.obj2,if.plot.new=FALSE) ```