In kasterma/pgmCourse:

library(devtools)
devtools::load_all()

Overview: Conditional Probability Queries

Evidence: $E = e$
Query: a subset of variables $Y$
Tasks: compute $P(Y \mid E = e)$

Here is an example of a simple joint distribution on two binary variables, and some queries for different evidence.

P_1 <- normalize_factor(create_factor(c(1,1,2,2), list(c(1,2), c(2,2))))
normalize_factor(factor_reduction(P_1, 2, 1))
normalize_factor(factor_reduction(P_1, 2, 2))
normalize_factor(factor_reduction(P_1, 1, 1))
normalize_factor(factor_reduction(P_1, 1, 2))

NP-hard problems:

Given a PGM $P_\phi$, a variable $X$ and a value $x \in Val(X)$, compute $P_\phi(X = x)$ (i.e. computing an exact probability for a simple event).
Or even decide if $P_\phi(X = x) > 0$.
Same happens with many versions of approximation.

In many cases one can effecciently get results.

Sum-Product for $P(Y \mid E = e)$: sum over product of factors to get a marginal distribution. For incorporating the evidence; first reduce the factors. Can renormalize at the end, so no need to incorporate partition fn at start.

We'll show a couple of examples using the student network, with all variables binary. D for difficulty of course (1 = easy, 2 = hard), G for grade in course (1 = low, 2 = high), I for student intelligence (1 = low, 2 = high), L for quality of letter (1 = not good, 2 = good), S for SAT score (1 = low, 2 = high).

library(igraph)
G_student <- graph_from_literal(D -+ G, I -+ G, I -+ S, G -+ L)

layout <- matrix(c(0,1, 1,0, 2,1, 3,0, 1,-1), ncol = 2, byrow = TRUE)
plot(G_student, layout = layout)

50% chance the course is hard

d_D <- create_factor(c(0.5,0.5), list(c(1,2)))
factor2df(d_D, names(V(G_student)))

90% change on intelligent student

d_I <- create_factor(c(0.1, 0.9), list(c(3,2)))
factor2df(d_I, names(V(G_student)))

d_G <- create_factor(c(0.5, 0.5, 0.8, 0.2, 0.2, 0.8, 0.4, 0.6),
                     list(c(2,2), c(1,2), c(3,2)))
factor2df(d_G, names(V(G_student)))

To see these are not unreasonable, notice that if we incorporate the efidence that the student is not smart we get

factor2df(normalize_factor(factor_reduction(d_G, 3, 1), 1), names(V(G_student)))

Compared to the following which is if the student is smart

factor2df(normalize_factor(factor_reduction(d_G, 3, 2), 1), names(V(G_student)))

And here is the grade distribution of a not smart student in a difficult course (80% probability on a low grade)

factor2df(factor_reduction(factor_reduction(d_G, 3, 1), 1, 2), names(V(G_student)))

d_S <- create_factor(c(0.8, 0.2, 0.1, 0.9), list(c(4,2), c(3,2)))
factor2df(d_S, names(V(G_student)))

For the letter you can see, a low grade leads to 80% probability of a bad latter, a high grade leads to 90% probability of a good letter.

d_L <- create_factor(c(0.8, 0.2, 0.1, 0.9), list(c(5,2), c(2,2)))
factor2df(d_L, names(V(G_student)))

With this we have distributions that look reasonable for all nodes in the student graph

ds <- list(d_D, d_I, d_G, d_S, d_L)

Now we can compute some queries with the sum product method.

First with no evidence, what is the distribution on grades?

ds_red <- ds    # no reductions needed since no evidence
d_prod <- Reduce(factor_product, ds_red)
factor2df(d_prod, names(V(G_student)))    ## full intermediate factor
a <- Reduce(factor_marginaliztion, c(1,3,4,5), d_prod)
factor2df(a, names(V(G_student)))

67% probability on a good grade. What is the student is not so smart?

ds_red <- lapply(ds, function(fact) factor_reduction(fact, 3, 1))
d_prod <- Reduce(factor_product, ds_red)
a <- normalize_factor(Reduce(factor_marginaliztion, c(1,4,5), d_prod))
factor2df(a, names(V(G_student)))

65% probability on a bad grade.

What is the student is not so smart, but the course is easy?

ds_red <- lapply(ds, function(fact) factor_reduction(factor_reduction(fact, 3, 1), 1, 1))
d_prod <- Reduce(factor_product, ds_red)
a <- normalize_factor(Reduce(factor_marginaliztion, c(1,4,5), d_prod))
factor2df(a, names(V(G_student)))

50% probability on a good grade, a 15% rise versus an arbitrary course. If the course is hard however.

ds_red <- lapply(ds, function(fact) factor_reduction(factor_reduction(fact, 3, 1), 1, 2))
d_prod <- Reduce(factor_product, ds_red)
a <- normalize_factor(Reduce(factor_marginaliztion, c(1,4,5), d_prod))
factor2df(a, names(V(G_student)))

80% probability on a bad grade.

Push summations into factor product:

variable elimination (dynamic programming)

Message passing over a graph

belief propagation
variational approximations

Random sampling instantiations

MCMC
Importance sampling

Overview: MAP Inference

Maximum a Posteriori

Evidence: $E = e$
Query: all other variables $Y = {X_1, \ldots, X_n} - E$
Taks: compute $MAP(Y \mid E = e)$. Note that there may be more than one possible solution.

Not equal to the maximum stepwise reached by maximising marginals.

NP-hard problems:

Given a PGM find the joint assignment with the highest probability.
Find an assignment exceeding a given probability.

In many cases one can efficiently get results.

Max-Product for $MAP(Y \mid E = e)$; denominator for normalizing w.r.t. evidence is constant, so can leave it out. Just work with reduced factors. Also can leave out the partition function again.

Push maximization into factor product
- variable elimination
message passing over a graph
- max-product belief propagation
using methods from integer programming
for some networks: graph-cut methods
combinatorial search

Variable Elimination Algorithm

Apply evidence, then push in sums as far as they will go, then perform the sums. Can do this in unnormalized context and normalize at the end.

library(igraph)
library(pryr)

# the standard student network
G_student <- graph_from_literal(D -+ G, I -+ G, I-+ S, G -+ L)

layout <- matrix(c(0,1, 1,0, 2,1, 3,0, 1,-1), ncol = 2, byrow = TRUE)
plot(G_student, layout = layout)
G_student

Eliminate-Var Z

$\Phi' = { \phi \in \Phi \mid Z \in Scope(\phi) }$
$\psi = \prod_{\phi \in \Phi'} \phi$
$\tau = \sum_Z \psi$
$\Phi := \Phi - \Phi' + {\tau}$

Variable Elimination Alg

reduce all factors by evidence, get a set of factors $\Phi$
for each non-query variable Z, run Eliminate-Var Z from $\Phi$
Multiply all remaining factors
Renormalize to get distributions

Complexity of Variable Elimination Algorithm

In computing $\psi_k(X_k) := \prod_{i=1}^{m} \phi_i$, per row multiply a value from each $\phi$, so get cost of $(m - 1) \cdot | \mathrm{Val}(X_k) |$ multiplications.

In computing $\tau_k(X_k - {Z}) := \sum_Z \psi_k(X_k)$, add every number in $\psi_k$ into one of values for $\tau_k$, so get a cost of $|\mathrm{Val}(X_k)|$ additions.

Write $N_k = |\mathrm{Val}(X_k)|$.

Total number of factors used in the procedure is $\leq m + n$, where $m$ is the number of factors in the initial network (each used in at most one elimination step, after which $1$ new factor is introduced), and $n$ is the number of variables (upper bound on number of elimination steps).

In Bayesian networks have one factor per variable, but in Markov networks can start with many more factors.

$N = \max(N_k) =$ size of the largest factor
Product operations $\sum_k (m_k - 1) N_k \leq N \sum_k (m_k - 1) \leq N (m + n)$
Sum operations $\leq \sum_k N_k \leq N n$
Total work is linear in $N$ and total number of factors ($\leq n + m$). I.e. linear in size of model (#factors, #variables) and size of the largest factor generated.

But the size of a factor is exponential in its scope: $N_k = O(d^{r_k})$ where $d = \max(|\mathrm{Val}(X_i)|)$, and $r_k = |X_k|$. And this is an opportunity for exponential blowup. The size of the blowup depends heavily on the elimination ordering.

Graph-Based Perspective

Moralize a graph from a Bayesian network: make edges undirected, and connect edges in v-structures (gives the induced Markov network for the given set of factors).

As we eliminate variables, remove nodes and edges. Sometimes need to add a fill edge, b/c a factor is introduced between nodes that were not connected before. All of the nbs of a varialbe that is eliminated, are connected directly after.

Induced graph $I_{\Phi, \alpha}$ over factors $\Phi$ and ordering $\alpha$:

Undirected graph
$X_i$ and $X_j$ are connected if they appeared in the same factor in a run of the VE algorithm using $\alpha$ as the ordering.

Thm: Every factor produced during VE is a clique in the induced graph.

clique: maximal fully connected subgraph.

Thm: Every (maximal) clique in the induced graph is a factor produced during VE.

Proof: take a clique, look at first var from it to be eliminated. AFter this no new nbds are added. At the time is was eliminated it had all clique members as nbds. That means it had factors involving all its nbds, e.g. it participated in factors with all these other nodes. This means when we multiply them together, we have a factor with all of them. QED

The width of an induced graph is the number of nodes in the largest clique in the graph minus 1.

Minimal induced width of a graph $K$ is $\min_\alpha(\width(I_{K, \alpha}))$.

Provides a lower bound on bst performance of VE to a model factorizing over $K$.

Finding Elimination Orderings

Thm: For a graph $H$, determining whether there exists an elimination ordering for $H$ with induced width $\leq k$ is NP-complete.

Note: even given the optimal ordering, inference may still be exponential.

Greedy search using heuristic cost function (at each point, eliminate node with smallest cost).

min-neighbors: number of neighbors in the current graph
min-weight: weight (# values) of factor formed
min-fill: number of new fill edges
weighted min-fill: total weight of new fill edges (edge weight = product of weights of the 2 nodes (weight of node: size of domain of associated var))

Thm: the induced graph is triangulated.

Proof: Towards a contradiction, take a set in a loop size greater than 3 that is not triangulated. Look at the first variable to be eliminated. Then a bridge, the fill edge, is introduced, which then in the induced graph was already present. QED

Can find elimination ordering by finding a low-width triangulation of the original graph.

kasterma/pgmCourse documentation built on May 20, 2019, 7:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com