testTwoBMN: Two-sample testing of binary Markov networks

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/testTwoBMN.R

Description

testTwoBMN tests whether two binary Markov networks are the same, i.e. whether the corresponding partial correlation matrices Θ_1=Θ_2.

Usage

1
testTwoBMN(dat, lambda, alpha.global = 0.05, multiTest = TRUE, alpha.multi = 0.1)

Arguments

dat

A list of length 2, consisting of the two data matrices generated from two populations. The data matrix from the k-th population should be of dimension n_k x p for k=1,2.

lambda

A list of length 2, consisting of the two 1 x p vectors of tuning parameters to be used in BMN.logistic for estimating the initial partial correlation matrix in each population.

alpha.global

The significance level for global testing. Default is 0.05.

multiTest

Whether to conduct multiple testing of pairwise edges. Default is TRUE.

alpha.multi

The level of false discovery rate in multiple testing, necessary if multiTest=TRUE. Default is 0.10.

Details

The function testTwoBMN implements the two-sample testing of binary Markov networks. It calculates the standardized pairwise statistic W_{r,t} for each pair of edge (r,t) using the data matrices in dat together with the corresponding tuning parameters lambda. The test statistic for Θ_1=Θ_2 is

M_{n,p}=max_{1≤ r < t ≤ p} W_{r,t}^2.

The null Θ_1=Θ_2 is rejected if

M_{n,p}-4\log p+\log(\log p)≥ q_α,

where q_α is the 1-α quantile of the type I extreme value distribution with cumulative distribution function exp{-(8π)^{-1/2}e^{-z/2}}. To recover the differential Markov network, testTwoBMN considers the multiple testing problem

H_{0,r,t}: θ_{r,t,1}-θ_{r,t,2}=0, 1≤ r < t ≤ p.

The test statistic for each individual hypothesis H_{0,r,t} is W_{r,t}. The multiple testing procedure in testOneBMN rejects the null H_{0,r,t} if |W_{r,t}|>τ, where the threshold τ is carefully chosen to ensure false discovery rate control at the pre-specified level.

Note the tuning parameters lambda should be carefully chosen to ensure that the initial estimate of the partial correlation matrix is reasonably good. In particular, the optimal tuning parameters required in global testing and multiple testing may not be the same. See global.tune and multiple.tune.twosample for how to choose lambda. More details on the testing procedures are available in Cai et al. (2017+).

testTwoBMN calls BMN.logistic to estimate the initial partial correlation matrices using nodewise l1-regularized logistic regressions before calling debias. As a result, testTwoBMN returns a warning message if "one multinomial or binomial class has 1 or 0 observations" occurs in any of the p logisitc regressions. The user should double check the data matrices in dat to avoid such situations before applying testTwoBMN again.

Value

statistic

The test statistic.

pvalue

The p-value for testing the null Θ_1=Θ_2.

reject

Whether to reject the null Θ_1=Θ_2.

W

A p x p matrix consisting of the standardized pairwise statistics.

thetaXInit

The initial estimated partial correlation matrix based on X using BMN.logistic.

thetaX

The de-sparsified partial correlation matrix for the first population.

thetaYInit

The initial estimated partial correlation matrix based on Y using BMN.logistic.

thetaY

The de-sparsified partial correlation matrix for the second population.

network

The estimated differential network with false discovery rate control at the pre-specified level alpha.multi.

Author(s)

Jing Ma

References

Cai, T. T., Li, H., Ma, J. & Xia, Y. (2017+). A testing framework for detecting differential micorbial networks.

See Also

testOneBMN, global.tune, multiple.tune.twosample, BMN.logistic

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
library(glmnet)

set.seed(1)

p = 50    # number of variables
n = 100   # number of observations per replicate
n0 = 1000 # burn in tolerance
rho_high = 0.5  # signal strength 
rho_low = 0.1   # signal strength 
ncond = 2       # number of conditions to compare
eps = 8/n       # tolerance for extreme proportioned observations
q = (p*(p - 1))/2

##---(1) Generate the network  
g_sf = sample_pa(p, directed=FALSE)
Amat = as.matrix(as_adjacency_matrix(g_sf, type="both"))

##---(2) Generate the Theta 
Theta = vector("list", ncond)
weights = matrix(0, p, p)
upperTriangle(weights) <- runif(q, rho_low, rho_high) * (2*rbinom(q, 1, 0.5) - 1)
weights = weights + t(weights)
Theta[[1]] = weights * Amat

##---(3) Generate the difference matrix
Delta = matrix(0, p, p)
upperTriangle(Delta) <- runif(q, 1, 2)*sqrt(log(p)/n) * 
                       (match(1:q, sample(q, 5), nomatch = 0)>0)*(2*rbinom(q, 1, 0.5) - 1)
Delta = Delta + t(Delta)

Theta[[2]] = Theta[[1]] + Delta
Theta[[1]] = Theta[[1]] - Delta

##---(4) Simulate data and choose tuning parameter
dat = vector("list", ncond)
lambda = vector("list", ncond)
for (k in 1:ncond){
  dat[[k]] = BMN.samples(Theta[[k]], n, n0, skip=1)
  tmp = sapply(1:p, function(i) as.numeric(table(dat[[k]][,i]))[1]/n )
  while(min(tmp)<eps || abs(1-max(tmp)<eps)){
    dat[[k]] = BMN.samples(Theta[[k]], n, n0, skip=10)
    tmp = sapply(1:p, function(i) as.numeric(table(dat[[k]][,i]))[1]/n )
  }
  empcov = cov(dat[[k]])
  LambdaMat = outer(seq(1,15), sqrt(0.01 * diag(empcov) * log(p)/n))
  tune = global.tune(dat[[k]], LambdaMat)
  lambda[[k]] = tune$lambdaOpt
}

##---(5) Two-sample testing
test = testTwoBMN(dat, lambda)

jingmafdu/TestBMN documentation built on April 4, 2018, 6:08 p.m.