README.md
In highDmean: Testing Two-Sample Mean in High Dimension

highDmean

This package highDmean is an implementation of the high-dimensional two-sample test proposed by Zhang and Wang (2020) “Result consistency of high dimensional two-sample tests applied to gene ontology terms with gene sets”. Testing multivariate two-sample mean equality has a classical solution–Hotelling’s T-square test. When the dimensionality is greater than the sample sizes, Hotelling’s test fails due to the singularity of covariance matrix. In this case, the test proposed by Zhang and Wang (2020), referred to as zwl_test() in this package, can tackle the issue and provide reliable and powerful test. It also implement the test proposed by Srivastava, Katayama, and Kano (2013) “A two sample test in high dimensional data.”

You can install the released version of highDmean from CRAN with:

install.packages("highDmean")

This is a basic example which shows you how to solve a common problem:

library(highDmean)
data <- buildData(n = 45, m =60, p = 300,
          muX = rep(0,300), muY = rep(0,300),
          dep = 'IND', S = 1, innov = rnorm)
zwl_test(data[[1]]$X, data[[1]]$Y, order = 2)
#> $statistic
#> [1] 0.7534648
#> 
#> $pvalue
#> [1] 0.4511707
#> 
#> $Tn
#> [1] 1.08859
#> 
#> $var
#> [1] 0.007897337

The functions zwl_test() and SKK_test() accept n by p and m by p data matrices with sample data from the first and second populations and return test statistics and p-values for the null hypothesis of equal means.

The buildData() function simulates high-dimensional data in the two-population setting with specified sample sizes, numbers of components, covariance structure, etc., and the functions zwl_sim() and SKK_sim() return test statistic values and p-values for lists of simulated data sets generated by buildData().