Home

/

GitHub

/

In kasterma/pgmCourse:

library(devtools)
devtools::load_all()

Message Passing; Belief propagation algrithm

From graphical mode, create cluster graph.

Initialize with factors.

Start with uninformative message

Def: cluster graph an undirected gaph

the nodes are $C_i \subseteq {X_1, \ldots, X_n}$ with the $X_i$ nodes of the original graph, and
an edge is associated with setsep $S_{i,j} \subseteq X_i \cap X_j$.

Given a set of factors $\Phi$ assign each $\phi_k \in \Phi$ to a cluster $C_{\alpha(k)}$ s.t. $\mathrm{Scope}(\phi_k) \subseteq C_{\alpha(k)}$.

Need every bit of evidence taken into account, hence every factor needs to be in at least one cluster,
but also need every bit of evidence to be taken into account only once (o/w I tell you a rumour, and then believe it more when you tell it back to me), hence assign to at most one cluster.
a cluster can have multiple factors assigned to it.

Def: the initial potential of the cluster

$$\psi_i(C_i) = \prod_{k: \alpha(k) = i} \phi_k$$

Def: the messages are of the form $$\delta_{i \rightarrow j}(S_{i,j}) = \sum_{C_i \setminus S_{i,j}}\psi_i \prod_{k \in \mathcal{N}i -{j}} \delta{k \rightarrow i}$$

Belief propagation algorithm now repeats:

choose an edge
pass the message along that edge

Finally compute

$$\beta_i(C_i) = \psi_i \prod_{k \in \mathcal{N}i} \delta{k \rightarrow i}$$

repeat until when?
how to select edges?
in general this is an approximate solution.

Cluster Graph Properties

Family preservation: for very factor $\phi$ there is a cluster $C_i$ with $\mathrm{Scope}(\phi) \subseteq C_i$.

Running intersection property: for each pair of cluster $C_i, C_j$ and $X \in C_i \cap C_j$ there is a unique path between $C_i$ and $C_j$ for which all clusters and sepsets contain $X$.

existence so that there can not be isolated parts of the graph with information about $X$ that don't pass messages about $X$.
uniqueness needed so that a belief doesn't reinforce itself by having a self reinforcing loop.

Note: if $X$ and $Y$ are very correlated we can still have things that are close to such loops. This is one of the problems with belief propagation.

RIP is equivalent to the statement that for any $X$ the set of clusters and sepsets that contain $X$ form a tree.

TODO: implement message passing; create example where RIP failing causes a problem.

Bethe Cluster Graph

Often used, but in some sense degenerate.

big clusters: $C_i = \mathrm{Scope}(\phi_i)$ for each factor $\phi_i$.
little cluster: for each variable $X_i$ a cluster ${X_i}$.
edge: $C_k \leftrightarrow X_i$ iff $X_i \in C_k$.

Always bipartite. No information passed about correlations between variables.

Properties of belief propagation

A cluster graph is calibrated iff every pair of adjacent clusters $C_i, C_j$ agree on the belief about their sepset $S_{i,j}$:

$$\sum_{C_i \setminus S_{i,j}} \beta_i(C_i) = \sum_{C_j \setminus S_{i,j}} \beta_j(C_j)$$

Convergence of the algorithm (message is the same as the previous message) implies calibration. $$\mu_{i,j} = \delta_{i \rightarrow j} \delta_{j \rightarrow i} = \sum_{C_j \setminus S_{i,j}} \beta_j $$

Reparametrization; by doing the message passing algorithm, no information has been lost

$$ \frac{\prod \beta_i}{\prod \mu_{i,j}} = \tilde{P}$$

remember $\tilde{P}$ is the original unnormalized measure (product of the factors); see this from the fact every message appears once in a $\beta_i$ and once in a $\mu_{i,j}$, and every factor appears once is a $\beta_i$.

Clique Tree Algorithm and Correctness

Clique tree is a special case of a cluster graph, but has better performance guarantees for message passing: faster convergence and convergence to correct answer.

Message passing in a tree gives correct beliefs; very close to variable elimination.

Def: Clique Tree is an undirected tree such that

nodes are clusters $C_i \subseteq {X_1, \ldots, X_n}$,
edge between $C_i$ and $C_j$ has sepset $S_{i,j} = C_i \cap C_j$.

Family preservation, Running Intersection Property. The running intersection property can be rewritten for trees into: for each pair of clusters $C_i$ and $C_j$ and each $X \in C_i \cap C_j$ in the unique path between $C_i$ and $C_j$ all clusters and sepsets contain $X$.

RIP (+ family preservation) => Clique tree correctness.

when a variable $X$ is eliminated from a message, then $X$ does not appear in the subtree the message was passed to.
this gives that the correspondence with variable elimination works.

Clique Tree Algorithm: Computation

Messages from leaf nodes have converged once computed. Other nodes should wait until they get a message before passing a message on. Wait for all but one, and then pass on.

For chains this is called the forward-backward algorithm.

Once $C_i$ recieves a final message from all its neighbors except for $C_j$, then the message computed to send to $C_j$ is also final.
Message from a leaf is always final.
If we pass the messages in the right order, need to pass only $2 (k-1)$ messages ($k$ = number of clusters).

Answering queries:

if about a posterior distribution all of whose variables appear in one clique, can sum out the additional variables appearin in that clique to get the posterior (unnormalized) distribution.
if introducing some new evidence $Z=z$ and querying $X$
if $X$ appears in clique with $Z$: just reduce the factor, and normalize.
if $X$ and $Z$ are not in the same clique: then reduce a factor containing $Z$ (can reuse many of the massages that were send/computed before).

We can compute all the marginals at only twice the cost of variable elimination, then storing the messages we can reuse many of them in incremental queries.

Clique Tree and Independence

Since clique trees appear so efficient, but we know inference is hard, need to understand where the difficulty lies.

For an edge $(i,j) \in T$ let $W_{<(i,j)}$ be all variables that appear only on the $C_i$ side of $T$. Variables on both sides are in the sepset $S_{i,j}$.

Thm: $T$ satisfies RIP iff for every edge $(i,j) \in T$ we have $$P_\Phi \models (W_{<(i,j)} \perp W_{<(j,i)} \mid S_{i,j}).$$

Assume not. Then there is a path in the induced Markov network from the one side to the other. Then there is some edge that goes from the one side to the other side. But this means there is a factor with a variable from the left side, and one from the right side. This contradicts family preservation.

Each sepset needs to separate the graph into two conditionally independent parts.

In a complete bipartite graph, need a whole side in a sepset.

Rectangular grid graph; need a setset size of a side. This is where we pay, clique trees have large sepsets.

Clique Trees and VE

How to contruct a clique tree; from a run of VE create a clique tree.

VE:

each step create a factor $\lambda_i$ through factor product
eliminate a variable in $\lambda_i$ to generate a new factor $\tau_i$
$\tau_i$ is used in computing other factors $\lambda_j$

Clique tree view:

$\lambda_i$ are cliques
$\tau_i$ messages created by clique $\lambda_i$ and transmitted to other cliques

Graph:

a cluster $C_i$ for each factor $\lambda_i$ used in the computation,
an edge $C_i \leftrightarrow C_j$ if the factor generated $\lambda_i$ is used in the computation $\lambda_j$.

No value in having neighboring nodes that are subsets of eachother; can postprocess to remove these.

Produces a tree: every intermediate factor is used only once.

Family preserving: each of the original factors is used somewhere in VE process.

Thm: if $T$ is a tree of clusters produced by VE, then $T$ obeys RIP (tree is directed by construction).

A variable $X$ is eliminated in exactly one step, look at that step. No step after involves $X$. Look at any cluster before that has $X$, but then it is in the sepset leaving that cluster. Then also it is in the cluster that message is send to.

Note: can simulate the running of VE, we don't have to actually do the factor products and marginalisations. Cost of variable elimination is about the same as passing messages in one direction of the tree. But with storing messages we can do inference after efficiently (initial run twice as expensive, but then can do this inference)

BP in practice

Misconception example: tight loops, strong potentials, conflicting directions. This is a situation where both convergence and accuracy are not good.

Not to do: synchronous BP, all messages are updated in parallel. Ising grid example.

Asynchronous BP (still need to choose an order); messages are updated one at a time.

Asynxhronous order 2 in Ising grid example; much better.

Convergence is a local property (some converge quickly, others may never converge)
Synchronous worse than asynchronous.
Message passing order makes a difference

Tree reparametrization (TRP). Pick a tree and pass messages to calibrate. Then pick different tree, calibrate. Ensure the trees cover all edges. The performance is improved with larger trees (e.g. pick spanning trees).

Residual belief propagation (RBP). Pass messages between two clusters whose beliefs over the sepset disagree the most. Use priority queue of edges.

Smoothing (Damping) Messages. $\delta_{i \rightarrow j} = \sum_{C_i \setminus S_{i,j} \psi_i \prod_{k \neq j} \delta_{k \rightarrow i}}$, replace by $\delta_{i \rightarrow j} = \lambda \left( \sum_{C_i \setminus S_{i,j} \psi_i \prod_{k \neq j} \delta_{k \rightarrow i}} \right) + (1 - \lambda) \delta^\text{old}_{i \rightarrow j}$.

Loopy BP and Message Decoding

$U_1, \ldots, U_k$ bits to send over noisy channel. First encode them into $X_1, \ldots, X_n$ which can be passed over the noisy channel such that after decoding to $V_1, \ldots, V_k$ we get the original bits with good probability.

Rate: $k/n$

Bit error rate: $\frac{1}{k} \sum_i P(V_i \neq U_i)$

Type of channel: binary symmetric channel, binary erasure channel, gaussian channel (add analog noise). These different types of channel have very different capacity.

Shannen's Theorem.

Turbo codes; two encoders and decoders, where the decoders communicate their computed probabilities iteratively.

kasterma/pgmCourse documentation built on May 20, 2019, 7:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com