Description Usage Arguments Value References
This method measures the similarity of two clustering computed on two non-overlapping subsamples of the dataset. The obtained measures are then compared with other similarity measures taken on generated datasets. Those generated datasets are computed as in the gap statistics method (Tibshirani et al, 2001).
1 2 3 |
X |
data matrix or data frame of size n x d, n observations and d features |
maxK |
maximum number of clusters to evaluate. |
clusterAlg |
clustering algorithm. Its output must be a list having a compoment "cluster" containing the assignation of each observation.
For more details, check the formatting of function |
similarity |
function measuring the similarity between two partitions. |
pmax |
threshold for the p-value |
dmin |
threshold for the d-value |
B |
number of resampling iterations |
B0 |
number of reference datasets to generate |
rho |
proportion of the train/test set |
verbose |
logical, if TRUE, plots the evolution of the algorithm |
... |
additional parameters for the clustering algorithm |
list of 3 attributes:
p
vector of p-values
d
vector of d-values
kopt
optimal number of clusters
Dudoit, S. and Fridlyand, J. (2002). A prediction-based resampling method for estimating the number of clusters in a dataset. Genome Biology, 3(7):research0036.1. https://doi.org/10.1186/gb-2002-3-7-research0036
Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic.Journal of the Royal Statistical Society Series B, 63:411-423.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.