Description Usage Arguments Value References
Tibshirani's gap statistic for the determination of the number of clusters. It computes the within cluster dispertion of the partition and it compares it with the within cluster dispertion of generated datasets having similar statistics to the original. The within-cluster dispertion is the normalized sum for each cluster of the sum of the distance between each pair in a cluster.
1 2 |
X |
data matrix or data frame of size n x d, n observations and d features |
maxK |
maximum number of clusters to evaluate. |
clusterAlg |
clustering algorithm. Its output must be a list having a compoment "cluster" containing the assignation of each observation.
For more details, check the formatting of function |
B |
number of reference datasets to generate |
null_distrib |
type of the null hypothesis. Can either be "gaussian", "uniform" or "uniformity". "gaussian" draws observations from a mulidimensional normal distribution with the same mean and variance as in the original dataset for each feature . "uniform" draws uniformely observations in the range of each feature. "uniformity" draws observation from a uniform distribution as in gap statistics (Tibshirani et al. 2001) |
verbose |
logical, if TRUE, plots the evolution of the algorithm |
... |
additional parameters for the clustering algorithm |
list of 3 components
kopt
optimal number of clusters
gap
vector of values for the gap statistic
s
empirical standard deviation of the gap statistic
Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic.Journal of the Royal Statistical Society Series B, 63:411-423.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.