The DREAM4 Predictive Signaling Network Challenge @prill2011crowdsourcing used a 17-node network containing canonical pathways downstream of four receptors. The network provides a simple test case for inference of causal structure with bulk data (one measurement per sample). We use this network to evaluate the relative advantages of active learning, and the importance of the use of prior information.
library(bninfo)
Visualizing the network:
data(dream) net <- dream$net model <- dream$model graphviz.plot(dream$net, shape = "rectangle")
The network has 4 receptors: two inflammatory (TNFa, IL1a), one insulin (IGF-I), and one growth factor receptor (TGFa). The DREAM exerimental data is comprised of samples with 5 stimulus conditions; no stimulus, stimulus on TNFa, on IL1a, on IGF1, and on TGFa. The dataset also includes 5 inhibition conditions; no inhibition, inhibition on ikk, on mek12, on pi3k, and on p38. There were 25 samples, one for each stimulus and inhibition pair.
and For each of the 4 receptors, there are 5 replicates with a stimulus applied to that receptor. There are an additional 5 replicates with no stimulus applied, providing information on base-line signaling activity.
The dataset has interventions on only 4 proteins; ikk,
node_names <- nodes(net) receptors <- c("igf1", "il1a", "tgfa", "tnfa") candidates <- setdiff(node_names, receptors)
According to Equation 6 in the manuscript, the presence probability an edge is present is slightly more the 0.5;
(uninformative_presence <- 0.5 + 1 / (2 * (nnodes(net) - 1)))
We have a priori causal knowledge in that receptor edges have a near 0 probability of having incoming edges. I create an edgewise-prior that reflects this knowledge.
prior <- lapply(receptors, function(receptor){ cbind(from = nodes(net), to = receptor) }) %>% {do.call("rbind", .)} %>% as.data.frame %>% mutate(presence = uninformative_presence, orientation = 1e-06) sample_n(tbl_df(prior), 10)
Next, we use the original dataset as historic data.
.data <- dream$data dream_ints <- dream$ints tbl_df(.data)
sim_sample <- function(fitted_net, stim_target, int_target = "none"){ if(int_target == "none"){ sim_net <- fitted_net } else { # Simulate intervention by mutilating the network ev_list <- structure(list("0"), names = int_target) sim_net <- mutilated(fitted_net, ev_list) } # Simulate stimulus using particle filter receptors <- c("igf1", "il1a", "tgfa", "tnfa") settings <- rep("0", 4) settings[which(receptors == stim_target)] <- "1" ev_list <- structure(as.list(settings), names = receptors) sim <- cpdist(sim_net, names(sim_net), evidence = ev_list, method = "lw") out <- sample_n(sim, 1) rownames(out) <- NULL out } # Simulate an inhibition experiment with different stimuli. sim_one_experiment <- function(fitted_net, int_target = "none"){ lapply(c("none", "igf1", "il1a", "tgfa", "tnfa"), sim_sample, fitted_net = fitted_net, int_target = int_target) %>% {do.call("rbind", .)} %>% as.data.frame %>% mutate(target = ifelse(is.null(int_target), "none", int_target)) } sim_batch <- function(fitted_net, int_targets){ sim <- lapply(int_targets, sim_one_experiment, fitted_net = fitted_net) %>% {do.call("rbind", .)} %>% data.frame attr(sim, "target") <- sim$target select(sim, -target) }
First, we demonstrate the naive case of intervention selection. Here, we use simulated historic data, and an uninformative edgewise-prior. The only causal information that can be inferred in this case is uninformative v-structures.
We randomly select nodes to target for intervention. I simulate 3300 observations for the historic data. The constraints I make on the network search I keep consistent across all subsequent analyses.
We start by generating some DAGs from historic data and a uninformative prior. If I don't pass a prior argument to the learning algorithm, it will automatically construct an uninformative prior.
R <- 200 algo_args <- list(tabu = 50, score = "mbde", exp = dream_ints, prior = "cs", iss = 1, maxp = 4) start_net_args <- list(nodes = names(.data), method = "ic-dag", every = 100, max.in.degree = 3, max.out.degree = 3) cl <- makeCluster(4) dags_uninformative <- boot_dags(.data, cluster = cl, R = R, m = nrow(.data), algorithm = "tabu", algorithm.args = algo_args, random.graph.args = start_net_args)
save(dags_uninformative, file = "vignettes/objects/dream/dags_uninformative.Rdata")
load(dags_uninformative, file = "objects/dream/dags_uninformative.Rdata")
In the following, I repeat the "passive learning" algorithm several times and look at the average of the results. The following passive learning algorithm incorporates the stopping criteria.
cl <- makeCluster(4) pl_results_list <- NULL for(i in 1:100){ print(i) pl_results <- passive_learning(dags_uninformative, candidates = candidates, cluster = cl, debug = TRUE) pl_results_list <- c(pl_results_list, list(pl_results)) } # Make array of results for each item pl_results_list <- lapply(pl_results_list, function(item){ item$p0 <- c(item$p0, rep(1, length(candidates) - length(item$p0))) item$info_gains <- c(item$info_gains, rep(0, length(candidates) - length(item$info_gains))) item })
save(pl_results_list, file = "vignettes/objects/dream/random_IG_P0.Rdata")
load(file = "objects/dream/random_IG_P0.Rdata")
mean_info_gains <- lapply(pl_results_list, function(item) item$info_gains) %>% {do.call("rbind", .)} %>% colMeans %>% round(3) mean_info_gains mean_p0 <- lapply(pl_results_list, function(item) item$p0) %>% {do.call("rbind", .)} %>% colMeans %>% round(3) mean_p0
Next, I generate expected L1 and TPR performance statistics in the case of naive "passive" selection. We replicate the same combination of receptor stimulus as the original experiment
cl <- makeCluster(16) l1_random_vals <- list() # L1 error tpr_random_vals <- list() # True positive rates thresh <- uninformative_presence # Edge detection threshold for calculating tpr # Arguments for learning when no intervention is applied start_net_args <- list(nodes = names(.data), method = "ic-dag", every = 100, max.in.degree = 3, max.out.degree = 3) for(j in 1:100){ # Get a random ordering of interventions target_ordering <- c("none", sample(candidates)) print(target_ordering) # Create arrays to hold stats after each experiment l1_random <- rep(NA, length(target_ordering)) tpr_random <- rep(NA, length(target_ordering)) for(i in 1:length(target_ordering)){ # Simulate an experiment and combine it with historic data .sim <- sim_batch(model, target_ordering[1:i]) int_array <- attr(.sim, "target") ints <- lapply(names(.sim), function(item) which(item == int_array)) %>% set_names(names(.sim)) # Simulate learning of the DAG from the experiment. algo_args <- list(tabu = 50, score = "mbde", exp = ints, prior = "cs", maxp = 4, iss = 1) strength <- boot_dags(.sim, cluster = cl, R = 200, algorithm = "tabu", algorithm.args = algo_args, random.graph.args = start_net_args) %>% custom.strength(names(.sim), cpdag = FALSE) # Collect the stats tpr_random[i] <- get_performance(strength, fit2net(net), consider_direction = TRUE) %>% {nrow(filter(., label == 1, strength >= thresh, direction > .5)) / nrow(filter(., label == 1))} l1_random[i] <- l1_error(strength, net) } message("L1: ", paste0(l1_random, collapse=", ")) message("TPR: ", paste0(round(tpr_random, 2), collapse=", ")) # Add to list tpr_random_vals[[j]] <- tpr_random l1_random_vals[[j]] <- l1_random }
save(tpr_random_vals, file="vignettes/objects/dream/random_tpr.Rdata") save(l1_random_vals, file="vignettes/objects/dream/random_l1.Rdata")
load(file="objects/dream/random_tpr.Rdata") load(file="objects/dream/random_l1.Rdata")
(random_tpr <- do.call("rbind", tpr_random_vals) %>% colMeans %>% round(3)) (random_L1 <- do.call("rbind", l1_random_vals) %>% colMeans %>% round(3))
cl <- makeCluster(4) al_results_uninformative <- active_learning(dags_uninformative, candidates, cluster = cl, debug = TRUE)
save(al_results_uninformative, file = "vignettes/objects/dream/al_results_uninformative.Rdata")
load(file = "objects/dream/al_results_uninformative.Rdata")
start_net_args <- list(nodes = names(.data), method = "ic-dag", every = 100, max.in.degree = 3, max.out.degree = 3) cl <- makeCluster(4) target_ordering <- c("none", al_results$selected) l1_set_uninformative <- list() # L1 Error rates tpr_set_uninformative <- list() # True positive rates thresh <- uninformative_presence # Edge detection threshold for calculating tpr for(j in 1:100){ print(j) l1 <- rep(NA, length(target_ordering)) tpr <- rep(NA, length(target_ordering)) for(i in 1:length(target_ordering)){ # Simulate an experiment and combine it with historic data .sim <- sim_batch(model, target_ordering[1:i]) int_array <- attr(.sim, "target") ints <- lapply(names(.sim), function(item) which(item == int_array)) %>% set_names(names(.sim)) # Simulate learning of the DAG from the experiment. algo_args <- list(tabu = 50, score = "mbde", exp = ints, prior = "cs", maxp = 4, iss = 1) strength <- boot_dags(.sim, cluster = cl, R = 200, algorithm = "tabu", algorithm.args = algo_args, random.graph.args = start_net_args) %>% custom.strength(names(.sim), cpdag = FALSE) # Collect the stats tpr[i] <- get_performance(strength, fit2net(net), consider_direction = TRUE) %>% {nrow(filter(., label == 1, strength >= thresh, direction > .5)) / nrow(filter(., label == 1))} l1[i] <- l1_error(strength, net) } message("L1: ", paste0(l1, collapse=", ")) message("TPR: ", paste0(round(tpr, 2), collapse=", ")) l1_set_uninformative[[j]] <- l1 tpr_set_uninformative[[j]] <- tpr }
save(tpr_set_uninformative, file = "vignettes/objects/dream/tpr_uninformative.Rdata") save(l1_set_uninformative, file = "vignettes/objects/dream/L1_uninformative.Rdata")
load(file = "vignettes/objects/dream/tpr_uninformative.Rdata") load(file = "vignettes/objects/dream/L1_uninformative.Rdata")
colMeans(do.call("rbind", tpr_set_uninformative)) %>% round(3) colMeans(do.call("rbind", l1_set_uninformative)) %>% round(3)
I convert presence and orientation probabilities in the edge-wise prior, into a single edge probability. This is the format the learning algorithm accepts.
beta <- prior %>% mutate(prob = presence * orientation) %>% select(-presence, -orientation)
I substitute a black list for the original prior in generating the DAGs, this is equivalent to using the prior and computationally more efficient.
receptor_bl <- lapply(receptors, function(receptor){ cbind(from = nodes(net), to = receptor) }) %>% {do.call("rbind", .)}
algo_args <- list(tabu = 50, score = "mbde", exp = dream_ints, prior = "cs", maxp = 4, iss = 1, blacklist = receptor_bl) start_net_args <- list(nodes = names(.data), method = "ic-dag", every = 100, max.in.degree = 3, max.out.degree = 3) cl <- makeCluster(4) dags_informative <- boot_dags(.data, cluster = cl, R = 200, m = nrow(.data), algorithm = "tabu", algorithm.args = algo_args, random.graph.args = start_net_args)
save(dags_informative, "vignettes/objects/dreams/dags_informative.Rdata")
load("objects/dreams/dags_informative.Rdata")
cl <- makeCluster(4) al_results_informative <- active_learning(dags_informative, candidates, beta = beta, cluster = cl, debug = TRUE)
save(al_results_informative, file = "vignettes/objects/dream/al_results_informative.Rdata")
load(file = "objects/dream/al_results_informative.Rdata")
start_net_args <- list(nodes = names(.data), method = "ic-dag", every = 100, max.in.degree = 3, max.out.degree = 3) cl <- makeCluster(4) target_ordering <- c("none", al_results_informative$selected) l1_set_informative <- list() # L1 Error rates tpr_set_informative <- list() # True positive rates thresh <- uninformative_presence # Edge detection threshold for calculating tpr for(j in 1:100){ print(j) l1 <- rep(NA, length(target_ordering)) tpr <- rep(NA, length(target_ordering)) for(i in 1:length(target_ordering)){ # Simulate an experiment and combine it with historic data .sim <- sim_batch(model, target_ordering[1:i]) int_array <- attr(.sim, "target") ints <- lapply(names(.sim), function(item) which(item == int_array)) %>% set_names(names(.sim)) # Simulate learning of the DAG from the experiment. algo_args <- list(tabu = 50, score = "mbde", exp = ints, prior = "cs", maxp = 4, iss = 1, blacklist = receptor_bl) strength <- boot_dags(.sim, cluster = cl, R = 200, algorithm = "tabu", algorithm.args = algo_args, random.graph.args = start_net_args) %>% custom.strength(names(.sim), cpdag = FALSE) # Collect the stats tpr[i] <- get_performance(strength, fit2net(net), consider_direction = TRUE) %>% {nrow(filter(., label == 1, strength >= thresh, direction > .5)) / nrow(filter(., label == 1))} l1[i] <- l1_error(strength, net) } message("L1: ", paste0(l1, collapse=", ")) message("TPR: ", paste0(round(tpr, 2), collapse=", ")) tpr_set_informative[[j]] <- tpr l1_set_informative[[j]] <- l1 }
save(tpr_set_informative, file = "vignettes/objects/dream/selected_tpr_informative.Rdata") save(l1_set_informative, file = "vignettes/objects/dream/selected_L1_informative.Rdata")
load(file = "objects/dream/selected_tpr_informative.Rdata") load(file = "objects/dream/selected_L1_informative.Rdata")
(tpr_informative <- colMeans(do.call("rbind", tpr_set_informative))) (l1_informative <- colMeans(do.call("rbind", l1_set_informative)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.