uclust3: U-statistic based significance clustering for three way...

Description Usage Arguments Details Value Examples

View source: R/uclust3.R

Description

Partitions data into three groups only when these partitions are statistically significant. If no significant partition exists, the test will return "homogeneous".

Usage

1
uclust3(md = NULL, data = NULL, alpha = 0.05, rep = 15)

Arguments

md

Matrix of distances between all data points.

data

Data matrix. Each row represents an observation.

alpha

Significance level.

rep

Number of times to repeat optimization procedures. Important for problems with multiple optima.

Details

This is the significance clustering procedure of Bello et al. (2021). The method first performs a homogeneity test to verify whether the data can be significantly partitioned. If the hypothesis of homogeneity is rejected, then the method will search, among all the significant partitions, for the partition that better separates the data, as measured by larger bn statistic. This function should be used in high dimension small sample size settings.

Either data or md should be provided. If data are entered directly, Bn will be computed considering the squared Euclidean distance.

Variance of bn is estimated through resampling, and thus, p-values may vary a bit in different runs.

For more detail see Bello, Debora Zava, Marcio Valk and Gabriela Bettella Cybis. "Clustering inference in multiple groups." arXiv preprint arXiv:2106.09115 (2021). See also is_homo3, uclust.

Value

Returns a list with the following elements:

groups

List with elements of final three groups

p.value

P-value for the test that renders the final partition, if heterogeneous. Homogeneity test p-value, if homogeneous.

alpha_corrected

Bonferroni corrected significance level for the test that renders the final partition, if heterogeneous. Homogeneity test significance level, if homogeneous.

ishomo

Logical, returns TRUE when the sample is homogeneous.

Bn

Value of Bn statistic for the final partition, if heterogeneous. Value of Bn statistic for the maximal homogeneity test partition, if homogeneous.

varBn

Variance estimate for final partition, if heterogeneous. Variance estimate for the maximal homogeneity test partition, if homogeneous.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
set.seed(123)
x = matrix(rnorm(70000),nrow=7)  #creating homogeneous Gaussian dataset
res = uclust3(data=x)
res

# uncomment to run
# x = matrix(rnorm(15000),nrow=15)
# x[1:6,] = x[1:6,]+1.5 #Heterogeneous dataset (first 5 samples have different mean)
# x[7:12,] = x[7:12,]+3
# res = uclust3(data=x)
# res$groups

uclust documentation built on June 19, 2021, 1:06 a.m.