What is the role of entropy in the conditional probability distributions in a Bayesian network in network inference?

library(bninfo)
library(magrittr)

To examine this question I start with the ALARM network. I'll fit the network structure to the ALARM dataset and simulate data from the fitted model. Applying structure inference to simulated data ensures we are working with data that is faithful to the ground truth probability distribution.

ground_truth <- empty.graph(names(alarm))
modelstring(ground_truth) <- paste("[HIST|LVF][CVP|LVV][PCWP|LVV][HYP][LVV|HYP:LVF]","[LVF][STKV|HYP:LVF][ERLO][HRBP|ERLO:HR][HREK|ERCA:HR][ERCA]",
  "[HRSA|ERCA:HR][ANES][APL][TPR|APL][ECO2|ACO2:VLNG][KINK]",
  "[MINV|INT:VLNG][FIO2][PVS|FIO2:VALV][SAO2|PVS:SHNT][PAP|PMB][PMB]",
  "[SHNT|INT:PMB][INT][PRSS|INT:KINK:VTUB][DISC][MVS][VMCH|MVS]",
  "[VTUB|DISC:VMCH][VLNG|INT:KINK:VTUB][VALV|INT:VLNG][ACO2|VALV]",
  "[CCHL|ACO2:ANES:SAO2:TPR][HR|CCHL][CO|HR:STKV][BP|CO:TPR]", sep = "")
graphviz.plot(ground_truth)
data(alarm)
fit_net <- bn.fit(ground_truth, alarm, method = "bayes")

In the dimensionality vignette I focused on inference of the undirected edges (Markov network). Here I infer the equivalence class.

graphviz.plot(cpdag(ground_truth))

I simulate data with 500,000 observations, which the analysis in the dimensionality vignette showed to be large enough for this network that adding more data has negligible impact on inference.

ground_truth_fit <- bn.fit(ground_truth, alarm, method = "bayes")
.sim_data <- rbn(ground_truth_fit, 500000)
inferred <- .sim_data %>% # Simulate
  fast.iamb %>% # Infer
  cpdag(moral = FALSE) %T>% # convert to equivalence class
  graphviz.plot # plot

Comparing the inferred and true structure.

performance_plot(cpdag(inferred, moral = FALSE), 
                 cpdag(ground_truth, moral = FALSE), plot_truth =TRUE)

Despite the large number of observations, only about half of edges are recovered.

performance_plot(cpdag(inferred, moral = FALSE), 
                 cpdag(ground_truth, moral = FALSE), 
                 plot_truth =FALSE)

And there a good amount of false positives.

So despite an ample number of observations, the algorithm can't detect all of the edges. Part of the issue is the dimensionality, and this is explored in the dimensionality vignette. Here, I explore how uncertainy (entropy) in the network affects inference, with the help of information-theoretic functions in bninfo.

Using information-theoretic functions in the context of structure inference

bninfo contains several information theoretic functions that elucidate uncertainty in the network and the structure inference process. Using these functions we can ask questions about inference.

Is there a relationship between inference of a node's parents and the mutual information between that node and its parents?

The mutual information between a node and its parents is the nodes's marginal entropy less the entropy in its conditional distribution. To calculate this, I use entropy to calculate the marginal entropies emperically from simulated data.

marginal_entropies <- .sim_data %>%
  lapply(table) %>% # Convert to table
  lapply(prop.table) %>% # Convert to a distribution
  sapply(entropy) 
marginal_entropies

I then query the Bayesian network to calculate the entropy of each CPD in the network using the function cpd_entropy_query.

````r cpd_entropies <- sapply(names(ground_truth_fit), cpd_entropy_query, bn_model = ground_truth_fit) cpd_entropies

Ranking the nodes by mutual information in their CPDs;

```r
plot(1:nnodes(ground_truth), 
     sort(marginal_entropies - cpd_entropies),
     xlab = nodes,
     ylab = "Mutual Information in CPD")

In the following figure, I sort the variables into two groups; variables where all the parents were detected in the structure inferece, and variables where not all the parents were detected.

performance_plot_list(cpdag(inferred), cpdag(ground_truth))[[1]] %>%
  {cbind(.[[1]], line_type = .[[3]])} %>%
  data.frame %>%
  mutate(detected = ifelse(line_type == "solid", TRUE, FALSE)) %>%
  select(from, to, detected) %>%
  graph.data.frame %>%
  {sapply(V(.), function(v) all(E(.)[to(v)]$detected))} %>%
  {factor(ifelse(., "all detected", "not all undetected"))} %>%
  plot(marginal_entropies - cpd_entropies, 
       xlab = "whether or not all parents were detected",
       ylab = "mutual information with parents")

There does not seem to be a significant difference across all parents.

A more direct approach is to sort edges into groups by the mutual information between the to and from nodes. Here I restrict the analysis to discovery of undirected edges.

get_mi <- function(a, b){
  a <- as.character(a); b <- as.character(b)
  as.numeric(marginal_entropies[a] - conditional_entropy_query(a, b, ground_truth_fit))
}
performance_arc_list(moral(inferred), moral(ground_truth))[[1]] %>%
  mutate(mi = apply(., 1, function(row) get_mi(row[1], row[2]))) %>%
  {boxplot(mi ~ type, data = ., xlab = "edge detection results", ylab = "edge mutual information")}

Here it is clear that edges with higher mutual information are more likely to be discovered.

Using information-theoretic functions with model averaging

Similar information theoretic functions are available for model averaging results. In the following I do model averaging just for the undirected Markov network.

alarm_boot <- random.graph(nodes(ground_truth_fit), num = 30, method = "ordered") %>%
  lapply(function(net){
    .sim_data %>%
      sample_n(size = 50)
    moral(hc(.sim_data, score = "bde", iss = 10, start = net))
    }) %>%
  custom.strength(nodes = nodes(ground_truth)) 
devtools::use_data(alarm_boot, overwrite = TRUE)
data(alarm_boot)

Above, I used conditional_entropy_query to characterize the mutual information the to and from node of an edge in the parameterized ground truth network. The averaging_entropy function quantifies the uncertainty about a given edge from model averaging results. Setting the undirected value to TRUE returns the entropy calculation for the presence of the edge, not the direction.

undirected_entropy <- averaging_entropy(alarm_boot, undirected = TRUE) 
undirected_entropy %>%
  filter(strength > .01) %>%
  sample_n(10)

In the following, I connect mutual information between an edge's to and from nodes calculated above, with the model averaging entropy of a given edge.

unnoisy_entropy <- undirected_entropy %>%
  mutate(edge_name = arcs2names(., directed_edges = FALSE))
ground_truth %>%
  moral %>%
  arcs %>%
  reduce_averaging %>%
  data.frame %>%
  mutate(mi = apply(., 1, function(row) get_mi(row[1], row[2]))) %>%
  mutate(edge_name = arcs2names(., directed_edges = FALSE)) %>%
  merge(unnoisy_entropy, by = "edge_name") %>%
  transmute(from = from.x, to = to.x, 
            averaging_entropy = entropy,
            true_edge_mi = mi,
            strength = strength) %>%
  filter(true_edge_mi > 0,
         averaging_entropy > 0) %>% 
  {plot(strength ~ true_edge_mi, .)}

The two value seem dependent, in a very polarized fashion. When the true mutual information is high, model averaging edge entropy is near 0 because the edge appears frequently across bootstrappend networks. For middle to low levels of mutual information, model averaging edge entropy becomes higher.

Entropy in the Presence of Interventions

A targeted intervention has the effect of fixing a node at a specific outcome. Fixing the node at a specific outcome means there is no uncertainty about which outcome it is in, and thus the entropy of its distribution should be 0.

For example, the entropy in the condition dependence distribution of "CCHL" in the ALARM network is;

cpd_entropy_query("CCHL", ground_truth_fit)

After fixing its value at "NORMAL";

ground_truth_fit %>%
  mutilated(list(CCHL = "HIGH")) %>%
  {cpd_entropy_query("CCHL", .)}%>%
  round(4)

To demonstrate the effections of interventions on model averaging entropy, I pull a simple 3-node path from the network;

triplet <- c("PVS", "SAO2", "CCHL")
triplet_net <- ground_truth %>%
  {bnlearn::subgraph(., nodes = triplet)} %T>%
  graphviz.plot
triplet_fit <- bn.fit(triplet_net, alarm[, triplet], method = "bayes")

Next, I fit this subnetwork to the data and simulate 500K observations, then apply model averaging. Each network structure is found with a hill-climbing search, which returns a fully directed network. Since edge direction cannot be inferred, then the expectation is that in half the cases a given edge will orient in one direction, while in the other half it will orient in another.

starting_nets <- random.graph(nodes(triplet_fit), 
                             method = "ic-dag", num = 500) %>%
  lapply(scramble) 
raw_averaging <- lapply(starting_nets, function(net){
    .sim_data <- rbn(triplet_fit, n = 50000)
    tabu(.sim_data , score = "bde", iss = 1, tabu = 100,
       start = net)
    }) %>%
      custom.strength(nodes = triplet, cpdag = FALSE) %>%
  averaging_entropy 
raw_averaging %>%
  mutate(strength = round(strength, 4),
         direction = round(direction, 4),
         entropy = round(entropy, 4))

That is indeed the case. Next, we simulate a dataset wherein half of the data is observational, and in the other half there is an intervention on CCHL that fixes its level at "NORMAL".

.int_data <- rbind(rbn(triplet_fit, n = 250000), 
                   rbn(mutilated(triplet_fit, list(CCHL = "NORMAL")), 
                       n= 250000))
int_list <- c(rep(NA, 250000), rep("CCHL", 250000)) %>%
  {sapply(triplet, function(node) {which(. == node)})}
int_averaging <- lapply(starting_nets, function(net){
    hc(.int_data, score = "mbde", iss = 1, 
       exp = int_list, start = net)
    }) %>%
      custom.strength(nodes = triplet, cpdag = FALSE) %>%
  averaging_entropy
int_averaging %>%
  mutate(strength = round(strength, 4),
         direction = round(direction, 4),
         entropy = round(entropy, 4))

The intervention resulted in increased certainty in edge direction. That certainty translates to reduced overall entropy over all the edges. The intervention reduced mean model averaging entropy from r mean(raw_averaging$entropy) to r mean(int_averaging$entropy).

Next, I repeat this result on simulated data from the MAPK signaling cascade.

I build a model from the Sachs 2005 T-cell signaling data set. I'll use the causal Bayesian network fitted from the model to simulate data, rather than the raw data itself, to control for any artifacts in the data.

triplet <-c("Raf", "Mek", "Erk")
net <- empty.graph(triplet) %>%
  `arcs<-`(value = matrix(c("Raf", "Mek", "Mek", "Erk"), ncol = 2, byrow = T)) %T>%
  graphviz.plot
.int_data <- rbind(rbn(triplet_fit, n = 2500), 
                   rbn(mutilated(triplet_fit, list(Mek = "1")), 
                       n= 2500))
int_array <- c(rep(NA, 2500), rep("Mek", 2500)) 
int_averaging <- lapply(starting_nets, function(net){
  boot <- sample(nrow(.int_data), replace = T)
  .boot_data <- .int_data[boot, ]
  int_list <- int_array[boot] %>%
    {sapply(triplet, function(node) {which(. == node)})}
  tabu(.boot_data, score = "mbde", iss = 1, 
       exp = int_list, start = net, tabu = 50)
    }) %>%
      custom.strength(nodes = triplet, cpdag = FALSE) %>%
  averaging_entropy
int_averaging %>%
  mutate(strength = round(strength, 4),
         direction = round(direction, 4),
         entropy = round(entropy, 4))
INT <- sapply(1:3, function(x) {which(isachs$INT == x) })
names(INT) <- names(isachs)[1:3]
starting_nets <- random.graph(nodes = c("Raf", "Mek", "Erk"), method = "melancon",
           num = 500, burn.in = 10^5, every = 100) %>%
  lapply(scramble)
obslist <- lapply(starting_nets, function(net) {
  hc(isachs[, nodes], score = "bde", iss = 1, start = net)
}) %>%
  custom.strength(nodes = nodes, cpdag = FALSE)

netlist <- lapply(start, function(net) {
  tabu(isachs, score = "mbde", exp = INT,
     iss = 1, start = net, tabu = 50)
})
intscore <- custom.strength(netlist, nodes = nodes, cpdag = FALSE)

Analysis of T-cell Intervention Data

isachs <- read.table("/Users/robertness/Downloads/sachs.interventional.txt",
            header = TRUE, colClasses = "factor")
isachs$INT <- isachs$INT %>% as.character %>% as.numeric
node_names <- names(isachs)[1:11]

Entropy on observational data.

starting_nets <- random.graph(nodes = node_names, method = "melancon",
           num = 500, burn.in = 10^5, every = 100) %>%
  lapply(scramble)
.obs_data <- isachs %>%
  filter(INT == "0") %>%
  select(-INT)
obs_averaging <- lapply(starting_nets, function(net) {
  boot <- sample(nrow(.obs_data), replace = T)
  .boot_data <- .obs_data[boot, ]
  tabu(.boot_data, score = "bde", iss = 1, 
       start = net, tabu = 50)
  }) %>%
  custom.strength(nodes = node_names, cpdag = TRUE) %>%
  averaging_entropy

obs_averaging %$% entropy %>% mean

Working through the overall workflow on the Sachs data.

library(combinat)
combinations <- NULL
for(i in 1:(length(int_targets) - 1)){
  comb_mat <- combn(int_targets, i) 
  for(j in 1:ncol(comb_mat)){
    combinations <- c(combinations, list(comb_mat[, j]))
  }
}
combinations <- c(combinations, list(combn(ints, length(ints))))
names(combinations) <- sapply(combinations, function(item) paste(sort(item), collapse = "-"))

averaging_entropies <- lapply(combinations, get_entropy)
names(averaging_entropies) <- names(combinations)
devtools::use_data(averaging_entropies, overwrite = TRUE)



trajectories <- lapply(permn(ints), function(item){
  item %>%
    paste0(collapse = "-") %>%
    reshuffle 
  }) %>%
  {
    df <- data.frame(do.call("cbind", (lapply(., function(item) item$entropy))))
    names(df) <- unlist(lapply(., function(item) item$trajectory))
    cbind(experiment = c(0, 1, 2 ,3, 4, 5), df)
  } %>%
  gather(trajectory, entropy, PKA_Akt_PKC_PIP2_Mek:Akt_PKA_PKC_PIP2_Mek)

ggplot(trajectories, aes(experiment, entropy)) + 
  geom_line(data = filter(trajectories, trajectory != "PKA_Akt_PIP2_Mek_PKC"), 
            aes(group = trajectory), alpha = 0.2) +
  geom_line(data = filter(trajectories, trajectory == "PKA_Akt_PIP2_Mek_PKC"), 
            aes(group = trajectory), colour = "red") +
  geom_line(data = filter(trajectories, trajectory == "PKC_Mek_PIP2_Akt_PKA"), 
            aes(group = trajectory), colour = "blue") +
  scale_x_discrete(labels = c("observational", "1st inhibition", "2nd inhibition", "3rd inhibition", "4th inhibition", "5th inhibition"))


robertness/bninfo documentation built on May 27, 2019, 10:32 a.m.