sLED: The sparse leading eigenvalue driven (sLED) test

Description Usage Arguments Details Value References See Also Examples

Description

The sLED test for two-sample high-dimensional covariance and relationship matrices. Suppose X, Y are p-dimensional random vectors independently coming from two populations. Let D be the differential matrix given by

D = A(Y) - A(X)

sLED tests the following hypothesis:

H_0: D=0 versus H_1: D != 0

where A() represents some p-by-p relationship matrix among features, including covariance matrices, correlation matrices, or the weighted adjacency matrices defined as

A_{ij} = |corr(i, j)|^b

for some constant b > 0, 1 <= i, j <= p. Let A represent the regular correlation matrix when b=0, and covariance matrix when b<0.

Usage

1
2
3
sLED(X, Y, adj.beta = -1, rho = 1000, sumabs.seq = 0.2, npermute = 100,
  useMC = FALSE, mc.cores = 1, seeds = NULL, verbose = TRUE,
  niter = 20, trace = FALSE)

Arguments

X

n1-by-p matrix for samples from the first population. Rows are samples/observations, while columns are the features.

Y

n2-by-p matrix for samples from the second population. Rows are samples/observations, while columns are the features.

adj.beta

a positive number representing the power to transform correlation matrices to weighted adjacency matrices by A_{ij} = |r_ij|^adj.beta, where r_ij represents the Pearson correlation. When adj.beta=0, the correlation marix is used. When adj.beta<0, the covariance matrix is used. The default value is adj.beta=-1.

rho

a large positive constant such that A(X)-A(Y)+diag(rep(rho, p)) is positive definite.

sumabs.seq

a numeric vector specifing the sequence of sparsity parameters to use, each between 1/sqrt(p) and 1.

npermute

number of permutations to use, default is 100

useMC

logical, whether to use multi-core version

mc.cores

a number indicating how many cores to use in parallelization

seeds

a numeric vector with the length equals to npermute, where seeds[i] specifies the seeding for the i-th permutation. Set to NULL if do not want to specify.

verbose

whether to print the progress during permutation tests

niter

the number of iterations to use in the PMD algorithm (see symmPMD())

trace

logical, whether to trace the progress of PMD algorithm (see symmPMD())

Details

For large data sets, the multi-core version is recommended: useMC=TRUE and mc.cores=n, where n is the number of cores to use.

Value

A list containing the following components:

Tn

the test statistic

Tn.perm

the test statistic for permuted samples

Tn.perm.sign

the sign for permuted samples: "pos" if the permuted test statistic is given by sEig(D), and "neg" if is given by sEig(-D), where sEig denotes the sparse leading eigenvalue.

pVal

the p-value of sLED test

sumabs.seq

a numeric vector for a sequence of sparsity parameters. Default is 0.2. The numbers must be between 1/sqrt{p} and 1.

rho

a positive constant to augment the diagonal of the differential matrix D such that D + rho*I becomes positive definite.

stats

a numeric vector of test statistics when using different sparsity parameters (corresponding to sumabs.seq).

sign

a vector of signs when using different sparsity parameters (corresponding to sumabs.seq). Sign is "pos" if the test statistic is given by sEig(D), and "neg" if is given by sEig(-D), where sEig denotes the sparse leading eigenvalue.

v

the sequence of sparse leading eigenvectors, each row corresponds to one sparsity parameter given by sumabs.seq.

leverage

the leverage of genes (defined as v^2 element-wise) using different sparsity parameters. Each row corresponds to one sparsity parameter given by sumabs.seq.

References

Zhu, Lei, Devlin and Roeder (2016), "Testing High Dimensional Covariance Matrices, with Application to Detecting Schizophrenia Risk Genes", arXiv:1606.00252.

See Also

symmPMD().

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Run sLED on a synthetic dataset under the null hypothesis
# where cov(X) = cov(Y)
n <- 50
p <- 100
set.seed(99)
X <- matrix(rnorm(n*p, mean=0, sd=1), nrow=n, ncol=p)
set.seed(42)
Y <- matrix(rnorm(n*p, mean=0, sd=1), nrow=n, ncol=p)

# run sLED and check the p-value
result <- sLED(X=X, Y=Y, npermute=50)
result$pVal


# Run sLED on a synthetic dataset under the alternative hypothesis
# where cov(X) != cov(Y), and the difference occur at the first 10 coordinates
n <- 50
p <- 100
set.seed(99)
X <- matrix(rnorm(n*p, mean=0, sd=1), nrow=n, ncol=p)
s <- 10 ## signals
sigma.2 <- diag(p)
sigma.2[1:s, 1:s] <- sigma.2[1:s, 1:s] + 0.2
set.seed(42)
Y2 <- MASS::mvrnorm(n, mu=rep(0, p), Sigma=sigma.2)

# run sLED and check the p-value
result <- sLED(X=X, Y=Y2, sumabs.seq=0.25, npermute=100, seeds = c(1:100))
result$pVal

# the signalling coordinates detected by sLED
which(result$leverage != 0)

lingxuez/sLED documentation built on May 7, 2019, 2:55 a.m.