A function to use structural hamming distance to aggregate DAGs

Description

This function uses a family of gSHD (generalized structural hamming distance) to aggregate an ensemble of DAGs (e.g., learned on bootstrap resamples)

Usage

1
2
score_shd(boot.adj, alpha = 1, threshold=0, max.step = 500, 
blacklist = NULL, whitelist = NULL, print = FALSE)

Arguments

boot.adj

A p by p by B array, where B is the number of DAGs to be aggregated. It records the adjacency matrices. It may be the output of the "score" function.

alpha

a positive scalar: alpha defines which member of the gSHD family should be used to aggregate the DAGs. In general, the larger the alpha, the more aggressive of the aggregation, in that less edges are retained leading to smaller FDR and less power, default = 1

threshold

a scalar: it defines the frequency cut-off value, default =0, meaning not to include any edge with (generalized) bootstrap selection frequency <0.5.

max.step

an integer: the maximum number of search steps, default = 500

blacklist

a p by p 0-1 matrix: if the (i,j)th-entry is "1", then the edge i–>j will be excluded from the DAG during the search, default=NULL

whitelist

a p by p 0-1 matrix: if the (i,j)th-entry is "1", then the edge i–>j will always be included in the DAG during the search, default=NULL

print

logic: whether print the step information, default= FALSE

Details

score_shd aggregates an ensemble of DAGs to get a DAG that minimizes the overall distance to the ensemble. A (generalized) structural hamming distance is used as the distance metric on the space of DAGs of a given set of nodes.

Value

A list with three components

adj.matrix

adjacency matrix of the aggregated DAG

final.step

a number recording how many search steps are conducted before the procedure stops

movement

a matrix recording the selected operation, addition, deletion or reversal of an edge, at each search step

Author(s)

Wang, R and Peng, J.

References

Wang, R. and Peng, J. (2013) Learning directed acyclic graphs via bootstrap aggregating. arXiv:1406.2098

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# read in data and true network
data(example)
Y.n=example$Y # data matrix 
true.dir=example$true.dir  #adjacency matrix of the data generating DAG
true.ske=example$true.ske  # skeleton graph of the data generating DAG
  

## (i) DAG learning  by hill climbing: no aggregation

# learn DAG using "BIC" score
temp=score(Y=Y.n, n.boot=0, score.type="BIC") 
adj=temp$adj.matrix

# Find the total number of skeleton edges 
# and the number of correct skeleton edges of the estimated DAG     
    tt=adj+t(adj) ##symmetrization
    correct.c=sum((tt>0)&(true.ske>0))/2    
    total.c=sum(tt>0)/2
    total.true=sum(true.ske>0)/2
    c(total.c, correct.c, total.true) ##326, 96, 109
    
## (ii) DAG learning by bagging   
set.seed(1)

# get the ensemble of DAGs from bootstrap resamples
temp.boot=score(Y.n, n.boot=10, score.type="BIC") 
boot.adj=temp.boot$adj.matrix

temp.bag=score_shd(boot.adj, alpha = 1) ##aggregating 
adj.bag=temp.bag$adj.matrix

# Find the total number of skeleton edges and the number of 
# correct skeleton edges of the estimated DAG     
 tt=adj.bag+t(adj.bag) ##symmetrization
  correct.c=sum((tt>0)&(true.ske>0))/2    
  total.c=sum(tt>0)/2
  total.true=sum(true.ske>0)/2
  c(total.c, correct.c, total.true) ##121, 89, 109 
 #Note: much less number of false edges and slightly less number of correct edges 
 #compared with the non-bagging result


## try different aggregation parameter alpha
#(i) alpha=0.5: less aggressive, more edges retained 
temp.bag=score_shd(boot.adj, alpha = 0.5) ##aggregating 
adj.bag05=temp.bag$adj.matrix

 # Find the total number of skeleton edges 
 # and the number of correct skeleton edges of the estimated DAG     
 tt=adj.bag05+t(adj.bag05) ##symmetrization
  correct.c=sum((tt>0)&(true.ske>0))/2    
  total.c=sum(tt>0)/2
  total.true=sum(true.ske>0)/2
  c(total.c, correct.c, total.true) ##132, 91, 109

#(ii) alpha=1.5: more aggressive, less edges retained 
temp.bag=score_shd(boot.adj, alpha = 1.5) ##aggregating 
adj.bag15=temp.bag$adj.matrix

 # Find the total number of skeleton edges 
 # and the number of correct skeleton edges of the estimated DAG     
 tt=adj.bag15+t(adj.bag15) ##symmetrization
  correct.c=sum((tt>0)&(true.ske>0))/2    
  total.c=sum(tt>0)/2
  total.true=sum(true.ske>0)/2
  c(total.c, correct.c, total.true) ##115, 87, 109
  
  #(iii) alpha=2: more aggressive, less edges retained 
  temp.bag=score_shd(boot.adj, alpha = 2) ##aggregating 
  adj.bag2=temp.bag$adj.matrix
  
   # Find the total number of skeleton edges 
   # and the number of correct skeleton edges of the estimated DAG     
   tt=adj.bag2+t(adj.bag2) ##symmetrization
    correct.c=sum((tt>0)&(true.ske>0))/2    
    total.c=sum(tt>0)/2
    total.true=sum(true.ske>0)/2
  c(total.c, correct.c, total.true) ##86, 67, 109