# epval_Chen2010: Empirical Permutation- or Resampling-Based p-value of the... In highmean: Two-Sample Tests for High-Dimensional Mean Vectors

## Description

Calculates p-value of the test for testing equality of two-sample high-dimensional mean vectors proposed by Chen and Qin (2010) based on permutation or parametric bootstrap resampling.

## Usage

 ```1 2``` ```epval_Chen2010(sam1, sam2, eq.cov = TRUE, n.iter = 1000, cov1.est, cov2.est, bandwidth1, bandwidth2, cv.fold = 5, norm = "F", seeds) ```

## Arguments

 `sam1` an n1 by p matrix from sample population 1. Each row represents a p-dimensional sample. `sam2` an n2 by p matrix from sample population 2. Each row represents a p-dimensional sample. `eq.cov` a logical value. The default is `TRUE`, indicating that the two sample populations have same covariance; otherwise, the covariances are assumed to be different. If `eq.cov` is `TRUE`, the permutation method is used to calculate p-values; otherwise, the parametric bootstrap resampling is used. `n.iter` a numeric integer indicating the number of permutation/resampling iterations. The default is 1,000. `cov1.est` This and the following arguments are only effective when `eq.cov = FALSE` and the parametric bootstrap resampling is used to calculate p-values. This argument specifies a consistent estimate of the covariance matrix of sample population 1 when `eq.cov` is `FALSE`. This can be obtained from various apporoaches (e.g., banding, tapering, and thresholding; see Pourahmadi 2013). If not specified, this function uses a banding approach proposed by Bickel and Levina (2008) to estimate the covariance matrix. `cov2.est` a consistent estimate of the covariance matrix of sample population 2 when `eq.cov` is `FALSE`. It is similar with the argument `cov1.est`. `bandwidth1` a vector of nonnegative integers indicating the candidate bandwidths to be used in the banding approach (Bickel and Levina, 2008) for estimating the covariance of sample population 1 when `eq.cov` is `FALSE`. This argument is effective when `cov1.est` is not provided. The default is a vector containing 50 candidate bandwidths chosen from {0, 1, 2, ..., p}. `bandwidth2` similar with the argument `bandwidth1`; it is used to specify candidate bandwidths for estimating the covariance of sample population 2 when `eq.cov` is `FALSE`. `cv.fold` an integer greater than or equal to 2 indicating the fold of cross-validation. The default is 5. See page 211 in Bickel and Levina (2008). `norm` a character string indicating the type of matrix norm for the calculation of risk function in cross-validation. This argument will be passed to the `norm` function. The default is the Frobenius norm (`"F"`). `seeds` a vector of seeds for each permutation or parametric bootstrap resampling iteration; this is optional.

## Details

See the details in `apval_Chen2010`.

## Value

A list including the following elements:

 `sam.info` the basic information about the two groups of samples, including the samples sizes and dimension. `opt.bw1` the optimal bandwidth determined by the cross-validation when `eq.cov` was `FALSE` and `cov1.est` was not specified. `opt.bw2` the optimal bandwidth determined by the cross-validation when `eq.cov` was `FALSE` and `cov2.est` was not specified. `cov.assumption` the equality assumption on the covariances of the two sample populations; this was specified by the argument `eq.cov`. `method` this output reminds users that the p-values are obtained using permutation or parametric bootstrap resampling. `pval` the p-value of the test proposed by Chen and Qin (2010).

## References

Bickel PJ and Levina E (2008). "Regularized estimation of large covariance matrices." The Annals of Statistics, 36(1), 199–227.

Chen SX and Qin YL (2010). "A two-sample test for high-dimensional data with applications to gene-set testing." The Annals of Statistics, 38(2), 808–835.

Pourahmadi M (2013). High-Dimensional Covariance Estimation. John Wiley & Sons, Hoboken, NJ.

`apval_Chen2010`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21``` ```#library(MASS) #set.seed(1234) #n1 <- n2 <- 50 #p <- 200 #mu1 <- rep(0, p) #mu2 <- mu1 #mu2[1:10] <- 0.2 #true.cov <- 0.4^(abs(outer(1:p, 1:p, "-"))) # AR1 covariance #sam1 <- mvrnorm(n = n1, mu = mu1, Sigma = true.cov) #sam2 <- mvrnorm(n = n2, mu = mu2, Sigma = true.cov) # increase n.iter to reduce Monte Carlo error. #epval_Chen2010(sam1, sam2, n.iter = 10) # the two sample populations have different covariances #true.cov1 <- 0.2^(abs(outer(1:p, 1:p, "-"))) #true.cov2 <- 0.6^(abs(outer(1:p, 1:p, "-"))) #sam1 <- mvrnorm(n = n1, mu = mu1, Sigma = true.cov1) #sam2 <- mvrnorm(n = n2, mu = mu2, Sigma = true.cov2) # increase n.iter to reduce Monte Carlo error #epval_Chen2010(sam1, sam2, eq.cov = FALSE, n.iter = 10, # bandwidth1 = 10, bandwidth2 = 10) ```