# testTwoBMN: Two-sample testing of binary Markov networks In jingmafdu/TestBMN: Testing for binary Markov networks

## Description

`testTwoBMN` tests whether two binary Markov networks are the same, i.e. whether the corresponding partial correlation matrices Θ_1=Θ_2.

## Usage

 `1` ```testTwoBMN(dat, lambda, alpha.global = 0.05, multiTest = TRUE, alpha.multi = 0.1) ```

## Arguments

 `dat` A list of length 2, consisting of the two data matrices generated from two populations. The data matrix from the k-th population should be of dimension n_k x p for k=1,2. `lambda` A list of length 2, consisting of the two 1 x p vectors of tuning parameters to be used in `BMN.logistic` for estimating the initial partial correlation matrix in each population. `alpha.global` The significance level for global testing. Default is 0.05. `multiTest` Whether to conduct multiple testing of pairwise edges. Default is `TRUE`. `alpha.multi` The level of false discovery rate in multiple testing, necessary if `multiTest=TRUE`. Default is 0.10.

## Details

The function `testTwoBMN` implements the two-sample testing of binary Markov networks. It calculates the standardized pairwise statistic W_{r,t} for each pair of edge (r,t) using the data matrices in `dat` together with the corresponding tuning parameters `lambda`. The test statistic for Θ_1=Θ_2 is

M_{n,p}=max_{1≤ r < t ≤ p} W_{r,t}^2.

The null Θ_1=Θ_2 is rejected if

M_{n,p}-4\log p+\log(\log p)≥ q_α,

where q_α is the 1-α quantile of the type I extreme value distribution with cumulative distribution function exp{-(8π)^{-1/2}e^{-z/2}}. To recover the differential Markov network, `testTwoBMN` considers the multiple testing problem

H_{0,r,t}: θ_{r,t,1}-θ_{r,t,2}=0, 1≤ r < t ≤ p.

The test statistic for each individual hypothesis H_{0,r,t} is W_{r,t}. The multiple testing procedure in `testOneBMN` rejects the null H_{0,r,t} if |W_{r,t}|>τ, where the threshold τ is carefully chosen to ensure false discovery rate control at the pre-specified level.

Note the tuning parameters `lambda` should be carefully chosen to ensure that the initial estimate of the partial correlation matrix is reasonably good. In particular, the optimal tuning parameters required in global testing and multiple testing may not be the same. See `global.tune` and `multiple.tune.twosample` for how to choose `lambda`. More details on the testing procedures are available in Cai et al. (2017+).

`testTwoBMN` calls `BMN.logistic` to estimate the initial partial correlation matrices using nodewise l1-regularized logistic regressions before calling `debias`. As a result, `testTwoBMN` returns a warning message if "one multinomial or binomial class has 1 or 0 observations" occurs in any of the p logisitc regressions. The user should double check the data matrices in `dat` to avoid such situations before applying `testTwoBMN` again.

## Value

 `statistic` The test statistic. `pvalue` The p-value for testing the null Θ_1=Θ_2. `reject` Whether to reject the null Θ_1=Θ_2. `W` A p x p matrix consisting of the standardized pairwise statistics. `thetaXInit` The initial estimated partial correlation matrix based on `X` using `BMN.logistic`. `thetaX` The de-sparsified partial correlation matrix for the first population. `thetaYInit` The initial estimated partial correlation matrix based on `Y` using `BMN.logistic`. `thetaY` The de-sparsified partial correlation matrix for the second population. `network` The estimated differential network with false discovery rate control at the pre-specified level `alpha.multi`.

Jing Ma

## References

Cai, T. T., Li, H., Ma, J. & Xia, Y. (2017+). A testing framework for detecting differential micorbial networks.

`testOneBMN`, `global.tune`, `multiple.tune.twosample`, `BMN.logistic`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51``` ```library(glmnet) set.seed(1) p = 50 # number of variables n = 100 # number of observations per replicate n0 = 1000 # burn in tolerance rho_high = 0.5 # signal strength rho_low = 0.1 # signal strength ncond = 2 # number of conditions to compare eps = 8/n # tolerance for extreme proportioned observations q = (p*(p - 1))/2 ##---(1) Generate the network g_sf = sample_pa(p, directed=FALSE) Amat = as.matrix(as_adjacency_matrix(g_sf, type="both")) ##---(2) Generate the Theta Theta = vector("list", ncond) weights = matrix(0, p, p) upperTriangle(weights) <- runif(q, rho_low, rho_high) * (2*rbinom(q, 1, 0.5) - 1) weights = weights + t(weights) Theta[[1]] = weights * Amat ##---(3) Generate the difference matrix Delta = matrix(0, p, p) upperTriangle(Delta) <- runif(q, 1, 2)*sqrt(log(p)/n) * (match(1:q, sample(q, 5), nomatch = 0)>0)*(2*rbinom(q, 1, 0.5) - 1) Delta = Delta + t(Delta) Theta[[2]] = Theta[[1]] + Delta Theta[[1]] = Theta[[1]] - Delta ##---(4) Simulate data and choose tuning parameter dat = vector("list", ncond) lambda = vector("list", ncond) for (k in 1:ncond){ dat[[k]] = BMN.samples(Theta[[k]], n, n0, skip=1) tmp = sapply(1:p, function(i) as.numeric(table(dat[[k]][,i]))[1]/n ) while(min(tmp)