knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE) # load Bayesian network packages library(SGS) # library(reshape2)
Bayesian networks are a fundamental tool in the world of machine learning and statistics and have recently gained popularity as a common model for causality. Here, we discuss efficient methods for inference in Bayesian networks and discuss how they can be applied to missing data problems.
Inference describes the calculation of the marginal probability $P(v'|e)$ of a fraction of variables $v' \in V$ conditioned on a (typically observed) fraction of variables $e$. As an example, we might be interested in the probability that it rained given that the grass is wet, where $v':=rain$ and $e:=wet\ grass$. A corresponiding directed acyclic graph (DAG) might look as follows:
# create random BN and label variables set.seed(6) myBayesNet <- randomBN(3) myBayesNet@variables <- c("rain", "sprinkler", "wet grass") plot_bn(myBayesNet)
What is the probability that it rained?
# define observed variables and calculate the probability myObserved <- list(observed.vars=c("rain"), observed.vals=c(2)) exactInference(myBayesNet,myObserved)
With marginal probability we mean calculating the probability of multiple variables, e.g. rain and wet grass. What's the probability of having rain and wet grass at the same time?
# define observed variables and calculate the marginal probability myObserved <- list(observed.vars=c("rain", "wet grass"), observed.vals=c(2,2)) exactInference(myBayesNet,myObserved)
Instead of simulating a Bayesian network, we can also learn it from data. In the following example, we learn a the Bayesian network from the well-known "Asia dataset" and subsequently perform inference on that.
# let's learn the Bayesian network from the "Asia dataset" asia_bn <- learn_bn(Asia) plot_bn(asia_bn) # now we can do the inference on the learned Bayesian network myObserved <- list(observed.vars=c("X", "D"), observed.vals=c(1,1)) exactInference(asia_bn, myObserved)
Next, let's consider a more high-dimensional problem. Assume, we have measured 100 genes and would like to know what the probability of 4 particular genes is. We will use both exact and approximate inference to get the marginal probability distribution. For approximate inference, we use the efficient SGS algorithm (default) and the famous loopy belief propagation algorithm as a reference.
# create random BN and label variables set.seed(1) myBayesNet <- randomBN(100) plot_bn(myBayesNet) # what's the probability of having rain and wet grass at the same time? # define observed variables and calculate marginal probability myObserved <- list(observed.vars=c(49,40,44,47), observed.vals=c(2,1,2,1)) exactInference(myBayesNet,myObserved) approxInference(myBayesNet,myObserved)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.