library(devtools)
devtools::load_all()

Message Passing; Belief propagation algrithm

From graphical mode, create cluster graph.

Initialize with factors.

Start with uninformative message

Def: cluster graph an undirected gaph

Given a set of factors $\Phi$ assign each $\phi_k \in \Phi$ to a cluster $C_{\alpha(k)}$ s.t. $\mathrm{Scope}(\phi_k) \subseteq C_{\alpha(k)}$.

Def: the initial potential of the cluster

$$\psi_i(C_i) = \prod_{k: \alpha(k) = i} \phi_k$$

Def: the messages are of the form $$\delta_{i \rightarrow j}(S_{i,j}) = \sum_{C_i \setminus S_{i,j}}\psi_i \prod_{k \in \mathcal{N}i -{j}} \delta{k \rightarrow i}$$

Belief propagation algorithm now repeats:

Finally compute

$$\beta_i(C_i) = \psi_i \prod_{k \in \mathcal{N}i} \delta{k \rightarrow i}$$

Cluster Graph Properties

Family preservation: for very factor $\phi$ there is a cluster $C_i$ with $\mathrm{Scope}(\phi) \subseteq C_i$.

Running intersection property: for each pair of cluster $C_i, C_j$ and $X \in C_i \cap C_j$ there is a unique path between $C_i$ and $C_j$ for which all clusters and sepsets contain $X$.

Note: if $X$ and $Y$ are very correlated we can still have things that are close to such loops. This is one of the problems with belief propagation.

RIP is equivalent to the statement that for any $X$ the set of clusters and sepsets that contain $X$ form a tree.

TODO: implement message passing; create example where RIP failing causes a problem.

Bethe Cluster Graph

Often used, but in some sense degenerate.

Always bipartite. No information passed about correlations between variables.

Properties of belief propagation

A cluster graph is calibrated iff every pair of adjacent clusters $C_i, C_j$ agree on the belief about their sepset $S_{i,j}$:

$$\sum_{C_i \setminus S_{i,j}} \beta_i(C_i) = \sum_{C_j \setminus S_{i,j}} \beta_j(C_j)$$

Convergence of the algorithm (message is the same as the previous message) implies calibration. $$\mu_{i,j} = \delta_{i \rightarrow j} \delta_{j \rightarrow i} = \sum_{C_j \setminus S_{i,j}} \beta_j $$

Reparametrization; by doing the message passing algorithm, no information has been lost

$$ \frac{\prod \beta_i}{\prod \mu_{i,j}} = \tilde{P}$$

remember $\tilde{P}$ is the original unnormalized measure (product of the factors); see this from the fact every message appears once in a $\beta_i$ and once in a $\mu_{i,j}$, and every factor appears once is a $\beta_i$.

Clique Tree Algorithm and Correctness

Clique tree is a special case of a cluster graph, but has better performance guarantees for message passing: faster convergence and convergence to correct answer.

Message passing in a tree gives correct beliefs; very close to variable elimination.

Def: Clique Tree is an undirected tree such that

Family preservation, Running Intersection Property. The running intersection property can be rewritten for trees into: for each pair of clusters $C_i$ and $C_j$ and each $X \in C_i \cap C_j$ in the unique path between $C_i$ and $C_j$ all clusters and sepsets contain $X$.

RIP (+ family preservation) => Clique tree correctness.

Clique Tree Algorithm: Computation

Messages from leaf nodes have converged once computed. Other nodes should wait until they get a message before passing a message on. Wait for all but one, and then pass on.

For chains this is called the forward-backward algorithm.

Answering queries:

We can compute all the marginals at only twice the cost of variable elimination, then storing the messages we can reuse many of them in incremental queries.

Clique Tree and Independence

Since clique trees appear so efficient, but we know inference is hard, need to understand where the difficulty lies.

For an edge $(i,j) \in T$ let $W_{<(i,j)}$ be all variables that appear only on the $C_i$ side of $T$. Variables on both sides are in the sepset $S_{i,j}$.

Thm: $T$ satisfies RIP iff for every edge $(i,j) \in T$ we have $$P_\Phi \models (W_{<(i,j)} \perp W_{<(j,i)} \mid S_{i,j}).$$

Assume not. Then there is a path in the induced Markov network from the one side to the other. Then there is some edge that goes from the one side to the other side. But this means there is a factor with a variable from the left side, and one from the right side. This contradicts family preservation.

Each sepset needs to separate the graph into two conditionally independent parts.

In a complete bipartite graph, need a whole side in a sepset.

Rectangular grid graph; need a setset size of a side. This is where we pay, clique trees have large sepsets.

Clique Trees and VE

How to contruct a clique tree; from a run of VE create a clique tree.

VE:

Clique tree view:

Graph:

No value in having neighboring nodes that are subsets of eachother; can postprocess to remove these.

Produces a tree: every intermediate factor is used only once.

Family preserving: each of the original factors is used somewhere in VE process.

Thm: if $T$ is a tree of clusters produced by VE, then $T$ obeys RIP (tree is directed by construction).

A variable $X$ is eliminated in exactly one step, look at that step. No step after involves $X$. Look at any cluster before that has $X$, but then it is in the sepset leaving that cluster. Then also it is in the cluster that message is send to.

Note: can simulate the running of VE, we don't have to actually do the factor products and marginalisations. Cost of variable elimination is about the same as passing messages in one direction of the tree. But with storing messages we can do inference after efficiently (initial run twice as expensive, but then can do this inference)

BP in practice

Misconception example: tight loops, strong potentials, conflicting directions. This is a situation where both convergence and accuracy are not good.

Not to do: synchronous BP, all messages are updated in parallel. Ising grid example.

Asynchronous BP (still need to choose an order); messages are updated one at a time.

Asynxhronous order 2 in Ising grid example; much better.

Tree reparametrization (TRP). Pick a tree and pass messages to calibrate. Then pick different tree, calibrate. Ensure the trees cover all edges. The performance is improved with larger trees (e.g. pick spanning trees).

Residual belief propagation (RBP). Pass messages between two clusters whose beliefs over the sepset disagree the most. Use priority queue of edges.

Smoothing (Damping) Messages. $\delta_{i \rightarrow j} = \sum_{C_i \setminus S_{i,j} \psi_i \prod_{k \neq j} \delta_{k \rightarrow i}}$, replace by $\delta_{i \rightarrow j} = \lambda \left( \sum_{C_i \setminus S_{i,j} \psi_i \prod_{k \neq j} \delta_{k \rightarrow i}} \right) + (1 - \lambda) \delta^\text{old}_{i \rightarrow j}$.

Loopy BP and Message Decoding

$U_1, \ldots, U_k$ bits to send over noisy channel. First encode them into $X_1, \ldots, X_n$ which can be passed over the noisy channel such that after decoding to $V_1, \ldots, V_k$ we get the original bits with good probability.

Rate: $k/n$

Bit error rate: $\frac{1}{k} \sum_i P(V_i \neq U_i)$

Type of channel: binary symmetric channel, binary erasure channel, gaussian channel (add analog noise). These different types of channel have very different capacity.

Shannen's Theorem.

Turbo codes; two encoders and decoders, where the decoders communicate their computed probabilities iteratively.



kasterma/pgmCourse documentation built on May 20, 2019, 7:40 a.m.