testTwoBMN: Two-sample testing of binary Markov networks
In jingmafdu/TestBMN: Testing for binary Markov networks

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/testTwoBMN.R

testTwoBMN tests whether two binary Markov networks are the same, i.e. whether the corresponding partial correlation matrices Θ_1=Θ_2.

1	testTwoBMN(dat, lambda, alpha.global = 0.05, multiTest = TRUE, alpha.multi = 0.1)

`dat`	A list of length 2, consisting of the two data matrices generated from two populations. The data matrix from the k-th population should be of dimension n_k x p for k=1,2.
`lambda`	A list of length 2, consisting of the two 1 x p vectors of tuning parameters to be used in `BMN.logistic` for estimating the initial partial correlation matrix in each population.
`alpha.global`	The significance level for global testing. Default is 0.05.
`multiTest`	Whether to conduct multiple testing of pairwise edges. Default is `TRUE`.
`alpha.multi`	The level of false discovery rate in multiple testing, necessary if `multiTest=TRUE`. Default is 0.10.

The function testTwoBMN implements the two-sample testing of binary Markov networks. It calculates the standardized pairwise statistic W_{r,t} for each pair of edge (r,t) using the data matrices in dat together with the corresponding tuning parameters lambda. The test statistic for Θ_1=Θ_2 is

M_{n,p}=max_{1≤ r < t ≤ p} W_{r,t}^2.

The null Θ_1=Θ_2 is rejected if

M_{n,p}-4\log p+\log(\log p)≥ q_α,

where q_α is the 1-α quantile of the type I extreme value distribution with cumulative distribution function exp{-(8π)^{-1/2}e^{-z/2}}. To recover the differential Markov network, testTwoBMN considers the multiple testing problem

H_{0,r,t}: θ_{r,t,1}-θ_{r,t,2}=0, 1≤ r < t ≤ p.

The test statistic for each individual hypothesis H_{0,r,t} is W_{r,t}. The multiple testing procedure in testOneBMN rejects the null H_{0,r,t} if |W_{r,t}|>τ, where the threshold τ is carefully chosen to ensure false discovery rate control at the pre-specified level.

Note the tuning parameters lambda should be carefully chosen to ensure that the initial estimate of the partial correlation matrix is reasonably good. In particular, the optimal tuning parameters required in global testing and multiple testing may not be the same. See global.tune and multiple.tune.twosample for how to choose lambda. More details on the testing procedures are available in Cai et al. (2017+).

testTwoBMN calls BMN.logistic to estimate the initial partial correlation matrices using nodewise l1-regularized logistic regressions before calling debias. As a result, testTwoBMN returns a warning message if "one multinomial or binomial class has 1 or 0 observations" occurs in any of the p logisitc regressions. The user should double check the data matrices in dat to avoid such situations before applying testTwoBMN again.

`statistic`	The test statistic.
`pvalue`	The p-value for testing the null Θ_1=Θ_2.
`reject`	Whether to reject the null Θ_1=Θ_2.
`W`	A p x p matrix consisting of the standardized pairwise statistics.
`thetaXInit`	The initial estimated partial correlation matrix based on `X` using `BMN.logistic`.
`thetaX`	The de-sparsified partial correlation matrix for the first population.
`thetaYInit`	The initial estimated partial correlation matrix based on `Y` using `BMN.logistic`.
`thetaY`	The de-sparsified partial correlation matrix for the second population.
`network`	The estimated differential network with false discovery rate control at the pre-specified level `alpha.multi`.

Jing Ma

Cai, T. T., Li, H., Ma, J. & Xia, Y. (2017+). A testing framework for detecting differential micorbial networks.

testOneBMN, global.tune, multiple.tune.twosample, BMN.logistic

library(glmnet)

set.seed(1)

p = 50    # number of variables
n = 100   # number of observations per replicate
n0 = 1000 # burn in tolerance
rho_high = 0.5  # signal strength 
rho_low = 0.1   # signal strength 
ncond = 2       # number of conditions to compare
eps = 8/n       # tolerance for extreme proportioned observations
q = (p*(p - 1))/2

##---(1) Generate the network  
g_sf = sample_pa(p, directed=FALSE)
Amat = as.matrix(as_adjacency_matrix(g_sf, type="both"))

##---(2) Generate the Theta 
Theta = vector("list", ncond)
weights = matrix(0, p, p)
upperTriangle(weights) <- runif(q, rho_low, rho_high) * (2*rbinom(q, 1, 0.5) - 1)
weights = weights + t(weights)
Theta[[1]] = weights * Amat

##---(3) Generate the difference matrix
Delta = matrix(0, p, p)
upperTriangle(Delta) <- runif(q, 1, 2)*sqrt(log(p)/n) * 
                       (match(1:q, sample(q, 5), nomatch = 0)>0)*(2*rbinom(q, 1, 0.5) - 1)
Delta = Delta + t(Delta)

Theta[[2]] = Theta[[1]] + Delta
Theta[[1]] = Theta[[1]] - Delta

##---(4) Simulate data and choose tuning parameter
dat = vector("list", ncond)
lambda = vector("list", ncond)
for (k in 1:ncond){
  dat[[k]] = BMN.samples(Theta[[k]], n, n0, skip=1)
  tmp = sapply(1:p, function(i) as.numeric(table(dat[[k]][,i]))[1]/n )
  while(min(tmp)<eps || abs(1-max(tmp)<eps)){
    dat[[k]] = BMN.samples(Theta[[k]], n, n0, skip=10)
    tmp = sapply(1:p, function(i) as.numeric(table(dat[[k]][,i]))[1]/n )
  }
  empcov = cov(dat[[k]])
  LambdaMat = outer(seq(1,15), sqrt(0.01 * diag(empcov) * log(p)/n))
  tune = global.tune(dat[[k]], LambdaMat)
  lambda[[k]] = tune$lambdaOpt
}

##---(5) Two-sample testing
test = testTwoBMN(dat, lambda)