knitr::opts_chunk$set(echo = TRUE, cache=TRUE)

In this tutorial, we discuss our estimator of a bernoulli distribution per edge for a given graph, and the strategies to identify an incoherent subgraph from the data.

Framework

Setting

Statistical Goal

Identify the sufficient parameters to characterize the distribution of connected and disconnected edges.

Model

Assume that the edge weights can be characterized by a bernoulli RV; that is:

\begin{align} \mathbb{A}{uv} \sim Bern(p{uv}) \end{align}

where $p_{uv}$ is the probability of edge $e_{uv}$ being connected.

Then our likelihood function is simply:

\begin{align} L_{\mathbb{A}}(A_i; \theta) &= \prod_{(u, v) \in E_i} Bern(w_i(e_{uv}); p_{uv}) \ &= \prod_{(u, v) \in E_i} p_{uv}^{w_i(e_{uv})}(1 - p_{uv})^{1 - w_i(e_{uv})} \end{align}

Estimators

Bernoulli Parameters

Using MLE, it is easy to see that:

\begin{align} \hat{p}{uv} = \frac{1}{n} \sum{i=1}^n w_i(e_{uv}) \end{align}

where $w_i(e_{uv}) \in {0, 1}$ is the binary edge weight of edge $e_{uv}$.

Note that if $w_i(e_{uv}) = 0 \;\forall i$, then $p_{uv} = 0$, which is undesirable since we only have a finite sample (and successive samples where $w_i(e_{uv})) \neq 0$ would lead to poor model performance), and vice versa for $p_{uv} = 1$ when $w_i(e_{uv}) = 0 \;\forall i$. Then consider the smoothed estimator:

\begin{align} p_{uv} = \begin{cases} n_n & max_{i}(w_i(e_{uv})) = 0 \ 1-n_n & max_{i}(w_i(e_{uv})) = 1 \ p_{uv} & else \end{cases} \end{align}



neurodata/subgraphing documentation built on May 21, 2019, 8:10 a.m.