# apval_aSPU: Asymptotics-Based p-values of the SPU and aSPU Tests In highmean: Two-Sample Tests for High-Dimensional Mean Vectors

## Description

Calculates p-values of the sum-of-powers (SPU) and adaptive SPU (aSPU) tests based on the asymptotic distributions of the test statistics (Xu et al, 2016).

## Usage

 1 2 3 apval_aSPU(sam1, sam2, pow = c(1:6, Inf), eq.cov = TRUE, cov.est, cov1.est, cov2.est, bandwidth, bandwidth1, bandwidth2, cv.fold = 5, norm = "F") 

## Arguments

 sam1 an n1 by p matrix from sample population 1. Each row represents a p-dimensional sample. sam2 an n2 by p matrix from sample population 2. Each row represents a p-dimensional sample. pow a numeric vector indicating the candidate powers γ in the SPU tests. It should contain Inf and both odd and even integers. The default is c(1:6, Inf). eq.cov a logical value. The default is TRUE, indicating that the two sample populations have same covariance; otherwise, the covariances are assumed to be different. cov.est a consistent estimate of the common covariance matrix when eq.cov is TRUE. This can be obtained from various apporoaches (e.g., banding, tapering, and thresholding; see Pourahmadi 2013). If not specified, this function uses a banding approach proposed by Bickel and Levina (2008) to estimate the covariance matrix. cov1.est a consistent estimate of the covariance matrix of sample population 1 when eq.cov is FALSE. It is similar with the argument cov.est. cov2.est a consistent estimate of the covariance matrix of sample population 2 when eq.cov is FALSE. It is similar with the argument cov.est. bandwidth a vector of nonnegative integers indicating the candidate bandwidths to be used in the banding approach (Bickel and Levina, 2008) for estimating the common covariance when eq.cov is TRUE. This argument is effective only if cov.est is not provided. The default is a vector containing 50 candidate bandwidths chosen from {0, 1, 2, ..., p}. bandwidth1 similar with the argument bandwidth; it is used to specify candidate bandwidths for estimating the covariance of sample population 1 when eq.cov is FALSE. bandwidth2 similar with the argument bandwidth; it is used to specify candidate bandwidths for estimating the covariance of sample population 2 when eq.cov is FALSE. cv.fold an integer greater than or equal to 2 indicating the fold of cross-validation. The default is 5. See page 211 in Bickel and Levina (2008). norm a character string indicating the type of matrix norm for the calculation of risk function in cross-validation. This argument will be passed to the norm function. The default is the Frobenius norm ("F").

## Details

Suppose that the two groups of p-dimensional independent and identically distributed samples \{X_{1i}\}_{i=1}^{n_1} and \{X_{2j}\}_{j=1}^{n_2} are observed; we consider high-dimensional data with p \gg n := n_1 + n_2 - 2. Assume that the covariances of the two sample populations are Σ_1 = (σ_{1, ij}) and Σ_2 = (σ_{2, ij}). The primary object is to test H_{0}: μ_1 = μ_2 versus H_{A}: μ_1 \neq μ_2. Let \bar{X}_{k} be the sample mean for group k = 1, 2. For a vector v, we denote v^{(i)} as its ith element.

For any 1 ≤ γ < ∞, the sum-of-powers (SPU) test statistic is defined as:

L(γ) = ∑_{i = 1}^{p} (\bar{X}_1^{(i)} - \bar{X}_2^{(i)})^γ.

For γ = ∞,

L (∞) = \max_{i = 1, …, p} (\bar{X}_1^{(i)} - \bar{X}_2^{(i)})^2/(σ_{1,ii}/n_1 + σ_{2,ii}/n_2).

The adaptive SPU (aSPU) test combines the SPU tests and improve the test power:

T_{aSPU} = \min_{γ \in Γ} P_{SPU(γ)},

where P_{SPU(γ)} is the p-value of SPU(γ) test, and Γ is a candidate set of γ's. Note that T_{aSPU} is no longer a genuine p-value. The asymptotic properties of the SPU and aSPU tests are studied in Xu et al (2016).

## Value

A list including the following elements:

 sam.info the basic information about the two groups of samples, including the samples sizes and dimension. pow the powers γ used for the SPU tests. opt.bw the optimal bandwidth determined by the cross-validation when eq.cov was TRUE and cov.est was not specified. opt.bw1 the optimal bandwidth determined by the cross-validation when eq.cov was FALSE and cov1.est was not specified. opt.bw2 the optimal bandwidth determined by the cross-validation when eq.cov was FALSE and cov2.est was not specified. spu.stat the observed SPU test statistics. spu.e the asymptotic means of SPU test statistics with finite γ under the null hypothesis. spu.var the asymptotic variances of SPU test statistics with finite γ under the null hypothesis. spu.corr.odd the asymptotic correlations between SPU test statistics with odd γ. spu.corr.even the asymptotic correlations between SPU test statistics with even γ. cov.assumption the equality assumption on the covariances of the two sample populations; this was specified by the argument eq.cov. method this output reminds users that the p-values are obtained using the asymptotic distributions of test statistics. pval the p-values of the SPU tests and the aSPU test.

## References

Bickel PJ and Levina E (2008). "Regularized estimation of large covariance matrices." The Annals of Statistics, 36(1), 199–227.

Pan W, Kim J, Zhang Y, Shen X, and Wei P (2014). "A powerful and adaptive association test for rare variants." Genetics, 197(4), 1081–1095.

Pourahmadi M (2013). High-Dimensional Covariance Estimation. John Wiley & Sons, Hoboken, NJ.

Xu G, Lin L, Wei P, and Pan W (2016). "An adaptive two-sample test for high-dimensional means." Biometrika, 103(3), 609–624.

cpval_aSPU, epval_aSPU
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 library(MASS) set.seed(1234) n1 <- n2 <- 50 p <- 200 mu1 <- rep(0, p) mu2 <- mu1 mu2[1:10] <- 0.2 true.cov <- 0.4^(abs(outer(1:p, 1:p, "-"))) # AR1 covariance sam1 <- mvrnorm(n = n1, mu = mu1, Sigma = true.cov) sam2 <- mvrnorm(n = n2, mu = mu2, Sigma = true.cov) # use true covariance matrix apval_aSPU(sam1, sam2, cov.est = true.cov) # fix bandwidth as 10 apval_aSPU(sam1, sam2, bandwidth = 10) # use the optimal bandwidth from a candidate set #apval_aSPU(sam1, sam2, bandwidth = 0:20) # the two sample populations have different covariances #true.cov1 <- 0.2^(abs(outer(1:p, 1:p, "-"))) #true.cov2 <- 0.6^(abs(outer(1:p, 1:p, "-"))) #sam1 <- mvrnorm(n = n1, mu = mu1, Sigma = true.cov1) #sam2 <- mvrnorm(n = n2, mu = mu2, Sigma = true.cov2) #apval_aSPU(sam1, sam2, eq.cov = FALSE, # bandwidth1 = 10, bandwidth2 = 10)