statistical_test: Statistical test for clustering relevance
In mattmail/clusterAnalysis: Several tools for determining the number of clusters in a dataset

Description Usage Arguments Details Value References

View source: R/statistical_test.R

Statistical test for checking the clustering of a specified dataset is relevant. Several datasets are generated under a null hypothesis and their distribution of nearest neighbours distances are compared with the one of the original dataset.

1	statistical_test(X, s, null_distrib = "gaussian")

`X`	data matrix or data frame of size n x d, n observations and d features
`s`	number of reference datasets to generate
`null_distrib`	type of the null hypothesis. Can either be "gaussian", "uniform" or "uniformity". "gaussian" draws observations from a mulidimensional normal distribution with the same mean and variance as in the original dataset for each feature . "uniform" draws uniformely observations in the range of each feature. "uniformity" draws observation from a uniform distribution as in gap statistics (Tibshirani et al. 2001).

The function plots the empirical distribution function of the nearest neighbours of the observed data against the empirical distribution under the null hypothesis. It also plots the identity line, representing the case where both distributions are in perfect agreement. If the first curve is quickly above the second line it means that it is likely that the clustering is relevant. If the returned pvalue is under 0.03, it is also a hint that the dataset is likely to have clusters.

list of 2 components

U: vector containing the discrepancy measures. The first value is the measure for the observed data, the s remaining are for the generated datasets.
pvalue: proportion of discrepancy measure of the generated datasets that are at least as large as the discrepancy measure of the original dataset.

McShane, L. M., Radmacher, M. D., Freidlin, B., Yu, R., Li, M.-C., and Simon, R. (2002). Methods for assessing reproducibility of clustering patterns observed in analyses of microarray data.Bioinformatics, 18(11):1462-1469. https://doi.org/10.1093/bioinformatics/18.11.1462
Tibshirani, R., Walther, G., and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic.Journal of the Royal Statistical Society Series B, 63:411-423.

mattmail/clusterAnalysis documentation built on Nov. 4, 2019, 6:18 p.m.

mattmail/clusterAnalysis index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mattmail/clusterAnalysis
Several tools for determining the number of clusters in a dataset

statistical_test: Statistical test for clustering relevance
In mattmail/clusterAnalysis: Several tools for determining the number of clusters in a dataset

Description

Usage

Arguments

Details

Value

References

Related to statistical_test in mattmail/clusterAnalysis...

R Package Documentation

Browse R Packages

We want your feedback!

mattmail/clusterAnalysis Several tools for determining the number of clusters in a dataset

statistical_test: Statistical test for clustering relevance In mattmail/clusterAnalysis: Several tools for determining the number of clusters in a dataset

Description

Usage

Arguments

Details

Value

References

Related to statistical_test in mattmail/clusterAnalysis...

R Package Documentation

Browse R Packages

We want your feedback!

mattmail/clusterAnalysis
Several tools for determining the number of clusters in a dataset

statistical_test: Statistical test for clustering relevance
In mattmail/clusterAnalysis: Several tools for determining the number of clusters in a dataset