Description Usage Arguments Details Value References
View source: R/statistical_test.R
Statistical test for checking the clustering of a specified dataset is relevant. Several datasets are generated under a null hypothesis and their distribution of nearest neighbours distances are compared with the one of the original dataset.
1 | statistical_test(X, s, null_distrib = "gaussian")
|
X |
data matrix or data frame of size n x d, n observations and d features |
s |
number of reference datasets to generate |
null_distrib |
type of the null hypothesis. Can either be "gaussian", "uniform" or "uniformity". "gaussian" draws observations from a mulidimensional normal distribution with the same mean and variance as in the original dataset for each feature . "uniform" draws uniformely observations in the range of each feature. "uniformity" draws observation from a uniform distribution as in gap statistics (Tibshirani et al. 2001). |
The function plots the empirical distribution function of the nearest neighbours of the observed data against the empirical distribution under the null hypothesis. It also plots the identity line, representing the case where both distributions are in perfect agreement. If the first curve is quickly above the second line it means that it is likely that the clustering is relevant. If the returned pvalue is under 0.03, it is also a hint that the dataset is likely to have clusters.
list of 2 components
U
vector containing the discrepancy measures. The first value is the measure for the observed data, the s remaining are for the generated datasets.
pvalue
proportion of discrepancy measure of the generated datasets that are at least as large as the discrepancy measure of the original dataset.
McShane, L. M., Radmacher, M. D., Freidlin, B., Yu, R., Li, M.-C., and Simon, R. (2002). Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.Bioinformatics, 18(11):1462-1469. https://doi.org/10.1093/bioinformatics/18.11.1462
Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic.Journal of the Royal Statistical Society Series B, 63:411-423.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.