Description Usage Arguments Details Value Examples
Homogeneity test based on the statistic bn
. The test assesses whether there exists a data partition
for which group separation is statistically significant according to the U-test. The null hypothesis
is overall sample homogeneity, and a sample is considered homogeneous if it cannot be divided into
two statistically significant subgroups.
1 |
md |
Matrix of squared Euclidean distances between all data points. |
data |
Data matrix. Each row represents an observation. |
rep |
Number of times to repeat optimization procedure. Important for problems with multiple optima. |
This is the homogeneity test of Cybis et al. (2017) extended to account for groups of size 1. The test is performed through two steps: an optimization procedure that finds the data partition that maximizes the standardized Bn and a test for the resulting maximal partition. Should be used in high dimension small sample size settings.
Either data
or md
should be provided.
If data are entered directly, Bn will be computed considering the squared Euclidean distance.
It is important that if a distance matrix is entered, it consists of squared Euclidean distances, otherwise test results are
invalid.
Variance of bn
is estimated through resampling, and thus, p-values may vary a bit in different runs.
For more detail see Cybis, Gabriela B., Marcio Valk, and SÃlvia RC Lopes. "Clustering and classification problems in genetics through U-statistics." Journal of Statistical Computation and Simulation 88.10 (2018) and Valk, Marcio, and Gabriela Bettella Cybis. "U-statistical inference for hierarchical clustering." arXiv preprint arXiv:1805.12179 (2018).
Returns a list with the following elements:
Test statistic. Minimum of the objective function for optimization (-stdBn).
Elements in group 1 in the maximal partition. (obs: this is not the best
partition for the data, see uclust
)
Elements in group 2 in the maximal partition.
P-value for the homogeneity test.
Values for the minimum objective function on all rep
optimization runs.
Resampling variance estimate for partitions with groups of size n/2 (or (n-1)/2 and (n+1)/2 if n is odd).
Resampling variance estimate for partitions with one group of size 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | x = matrix(rnorm(500000),nrow=50) #creating homogeneous Gaussian dataset
res = is_homo(data=x)
x[1:30,] = x[1:30,]+0.15 #Heterogeneous dataset (first 30 samples have different mean)
res = is_homo(data=x)
md = as.matrix(dist(x)^2) #squared Euclidean distances for the same data
res = is_homo(md)
# Multidimensional sacling plot of distance matrix
fit <- cmdscale(md, eig = TRUE, k = 2)
x <- fit$points[, 1]
y <- fit$points[, 2]
plot(x,y, main=paste("Homogeneity test: p-value =",res$p.MaxTest))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.