# sLED: The sparse leading eigenvalue driven (sLED) test In lingxuez/sLED: A Sparse Leading Eigenvalue Driven (sLED) Test for High-dimensional Matrices

## Description

The sLED test for two-sample high-dimensional covariance and relationship matrices. Suppose X, Y are p-dimensional random vectors independently coming from two populations. Let D be the differential matrix given by

D = A(Y) - A(X)

sLED tests the following hypothesis:

H_0: D=0 versus H_1: D != 0

where A() represents some p-by-p relationship matrix among features, including covariance matrices, correlation matrices, or the weighted adjacency matrices defined as

A_{ij} = |corr(i, j)|^b

for some constant b > 0, 1 <= i, j <= p. Let A represent the regular correlation matrix when b=0, and covariance matrix when b<0.

## Usage

 ```1 2 3``` ```sLED(X, Y, adj.beta = -1, rho = 1000, sumabs.seq = 0.2, npermute = 100, useMC = FALSE, mc.cores = 1, seeds = NULL, verbose = TRUE, niter = 20, trace = FALSE) ```

## Arguments

 `X` n1-by-p matrix for samples from the first population. Rows are samples/observations, while columns are the features. `Y` n2-by-p matrix for samples from the second population. Rows are samples/observations, while columns are the features. `adj.beta` a positive number representing the power to transform correlation matrices to weighted adjacency matrices by A_{ij} = |r_ij|^adj.beta, where r_ij represents the Pearson correlation. When `adj.beta=0`, the correlation marix is used. When `adj.beta<0`, the covariance matrix is used. The default value is `adj.beta=-1`. `rho` a large positive constant such that A(X)-A(Y)+diag(rep(rho, p)) is positive definite. `sumabs.seq` a numeric vector specifing the sequence of sparsity parameters to use, each between 1/sqrt(p) and 1. `npermute` number of permutations to use, default is 100 `useMC` logical, whether to use multi-core version `mc.cores` a number indicating how many cores to use in parallelization `seeds` a numeric vector with the length equals to `npermute`, where `seeds[i]` specifies the seeding for the i-th permutation. Set to `NULL` if do not want to specify. `verbose` whether to print the progress during permutation tests `niter` the number of iterations to use in the PMD algorithm (see `symmPMD()`) `trace` logical, whether to trace the progress of PMD algorithm (see `symmPMD()`)

## Details

For large data sets, the multi-core version is recommended: `useMC=TRUE` and `mc.cores=n`, where `n` is the number of cores to use.

## Value

A list containing the following components:

 `Tn` the test statistic `Tn.perm` the test statistic for permuted samples `Tn.perm.sign` the sign for permuted samples: "pos" if the permuted test statistic is given by sEig(D), and "neg" if is given by sEig(-D), where `sEig` denotes the sparse leading eigenvalue. `pVal` the p-value of sLED test `sumabs.seq` a numeric vector for a sequence of sparsity parameters. Default is 0.2. The numbers must be between 1/sqrt{p} and 1. `rho` a positive constant to augment the diagonal of the differential matrix D such that D + rho*I becomes positive definite. `stats` a numeric vector of test statistics when using different sparsity parameters (corresponding to `sumabs.seq`). `sign` a vector of signs when using different sparsity parameters (corresponding to `sumabs.seq`). Sign is "pos" if the test statistic is given by sEig(D), and "neg" if is given by sEig(-D), where `sEig` denotes the sparse leading eigenvalue. `v` the sequence of sparse leading eigenvectors, each row corresponds to one sparsity parameter given by `sumabs.seq`. `leverage` the leverage of genes (defined as v^2 element-wise) using different sparsity parameters. Each row corresponds to one sparsity parameter given by `sumabs.seq`.

## References

Zhu, Lei, Devlin and Roeder (2016), "Testing High Dimensional Covariance Matrices, with Application to Detecting Schizophrenia Risk Genes", arXiv:1606.00252.

`symmPMD()`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32``` ```# Run sLED on a synthetic dataset under the null hypothesis # where cov(X) = cov(Y) n <- 50 p <- 100 set.seed(99) X <- matrix(rnorm(n*p, mean=0, sd=1), nrow=n, ncol=p) set.seed(42) Y <- matrix(rnorm(n*p, mean=0, sd=1), nrow=n, ncol=p) # run sLED and check the p-value result <- sLED(X=X, Y=Y, npermute=50) result\$pVal # Run sLED on a synthetic dataset under the alternative hypothesis # where cov(X) != cov(Y), and the difference occur at the first 10 coordinates n <- 50 p <- 100 set.seed(99) X <- matrix(rnorm(n*p, mean=0, sd=1), nrow=n, ncol=p) s <- 10 ## signals sigma.2 <- diag(p) sigma.2[1:s, 1:s] <- sigma.2[1:s, 1:s] + 0.2 set.seed(42) Y2 <- MASS::mvrnorm(n, mu=rep(0, p), Sigma=sigma.2) # run sLED and check the p-value result <- sLED(X=X, Y=Y2, sumabs.seq=0.25, npermute=100, seeds = c(1:100)) result\$pVal # the signalling coordinates detected by sLED which(result\$leverage != 0) ```