knitr::opts_chunk$set(echo = TRUE, cache=TRUE)

In this tutorial, we discuss our estimator of a bernoulli distribution per edge for a given graph.

Framework

Setting

Statistical Goal

Identify the sufficient parameters to characterize the distributions of each edge's weights.

Model

Assume that the edge weights can be characterized by a beta RV; that is:

\begin{align} \mathbb{A}{uv} \sim Beta(\alpha{uv}, \beta_{uv}) \end{align}

where $\alpha_{uv}, \beta_{uv}$ are the parameters to the Beta distribution for edge $e_{uv}$.

Then our likelihood function is simply:

\begin{align} L_{\mathbb{A}}(A_i; \theta) &= \prod_{(u, v) \in E_i} Beta(w_i(e_{uv}); \alpha_{uv}, \beta_{uv}) \ &= \prod_{(u, v) \in E_i} \frac{w_i(e_{uv})^{\alpha_{uv}}(1-w_i(e_{uv}))^{\beta_{uv} - 1}}{B(\alpha_{uv}, \beta_{uv})} \end{align}

Estimators

Beta Parameters

Using the method of moments, it is easy to see that:

\begin{align} \hat{\mu}{uv} &= \mathbb{E}{\mathbb{A}{uv}}[A{uv}] = \frac{\hat{\alpha}{uv}}{\hat{\alpha}{uv} + \hat{\beta}{uv}} \ \hat{\sigma}{uv}^2 &= \mathbb{E}{\mathbb{A}{uv}}[(A_{uv} - \mu_{uv})^2] = \frac{\hat{\alpha}{uv}\hat{\beta}{uv}}{\hat{\alpha}{uv} + \hat{\beta}{uv})^2(\hat{\alpha_{uv}} + \hat{\beta}_{uv} + 1)} \end{align}

Where $\hat{\mu}{uv}$ is an estimator of the sample mean and $\hat{\sigma}{uv}$ is the estimator of the sample standard deviation of our weights ${w_i(e_{uv})}{i=1}^n$. Solving this system of equations for $\alpha{uv}$ and $\beta_{uv}$, we find that:

\begin{align} \hat{\alpha}{uv} &= \left(\frac{1 - \hat{\mu{uv}}}{\hat{\sigma}{uv}^2} - \frac{1}{\hat{\mu}{uv}}\right)\hat{\mu}{uv}^2 \ \hat{\beta}{uv} &= \hat{\alpha}{uv} \left(\frac{1}{\hat{\mu{uv}}} - 1\right) \end{align}

Pseudo Code

Below is bare-bones pseudo code for each method. Note that optional parameters are omitted.

# A function to estimate a beta distribution for each edge in a graph.
# Inputs
#   sample: a [n x m x s] element array, where we have s observations of nxm graphs.
# Outputs
#   alpha: a [n x m] matrix denoting the alpha parameter per edge.
#   beta: a [n x m] matrix denoting the beta parameter per edge.
sg.beta.graph_estimator(sample):
  for u in 1:n:
    for v in 1:m:
      alpha[u, v], beta[u, v] = beta_est(sample[u, v, :])  # estimate the parameters given the u, v edge from all samples
  return alpha, beta

# a function to estimate a beta distribution for a sample.
# Inputs
#   sample: a [s] element vector that is between 0 and 1.
# Outputs
#   alpha: the alpha parameter of the method of moments estimate given the sample.
#   beta: the beta parameter of the method of moments estimate given the sample.
sg.beta.estimator(sample):
  alpha = ((1 - mu)/sig^2 - 1/mu)*mu^2
  beta = alpha*(1/mu - 1)
  return alpha, beta

# a function to generate random beta-distributed samples.
# Inputs
#   alpha: a [n x m] matrix of the alpha parameters per edge.
#   beta: a [n x m] matrix of the beta parameters per edge.
#   p: the number of samples.
# Outputs
#   samp: a [n x m x p] array sampling from the [n x m] graph RV p times.
sg.beta.sample_graph(alpha, beta, p):
  for (i in 1:n) {
    for (j in 1:m) {
      samp[i,j,] <- rbeta(s, alpha[i, j], beta[i, j]) # samples from random beta distribution with alpha[i,j] beta[i,j]
    }
  }
  return samp

Simulations

Evaluation

Mean Squared Error

When assessing the difference between two functions, we will frequently use the mean-squared error:

\begin{align} MSE(\hat{y}, y) = \frac{1}{n}\sum_{i=1}^n (\hat{y}_i - y)^2 \end{align}

where $\hat{y} \in \mathbb{R}^b$ is an $n$ element estimator for the function $y \in \mathbb{R}^n$.

Frobenius Norm of Difference

When evaluating the similarity of two matrices, we will use the frobenius norm of the difference:

\begin{align} ||X - Y||F = \sqrt{\sum{i=1}^n \sum_{j=1}^n |x_{ij} - y_{ij}|^2} = \sqrt{tr((X - Y)^(X - Y))} \end{align*}

Sampled Edge-weights Converge to True Distribution

First, we will generate some simulation data using our sg.beta.sample_graph function, and we will check whether the data is properly distributed. We will repeatedly use this function in this and later notebooks, so it will be critical for this function to work properly.

Visualization

First, we will begin with a qualitative evaluation of the distribution of our 6 edges with respect to the true distribution with a small number of sampled graphs (100) and a large number of sampled graphs (5000). We expect that with a larger number of sampled graphs, we should get qualitatively better estimated distributions.

require(subgraphing)
require(ggplot2)
require(reshape2)
library(MASS)
library(scales)
require(Rmisc)

lseq <- function(from, to, n) {
  return(round(exp(seq(log(from), log(to), length.out = n))))
}
mse <- function(x, y) {
  return(mean((x - y)^2))
}

xdim <- 3
ydim <- 2
alpha <- array(runif(xdim*ydim, min=.5, max=5), dim=c(xdim, ydim))  # true alpha
beta <- array(runif(xdim*ydim, min=.5, max=5), dim=c(xdim, ydim))  # true beta

n_vec <- c(100, 5000)
n_spaces <- length(n_vec)
n_bin <- 15
edge_err <- array(NaN, dim=c(xdim, ydim, n_spaces))  # the error per edge per iteration
hist_breaks <- seq(0, 1, length.out=n_bin)  # beta distr btwn 0 and 1

for (k in 1:length(n_vec)) {
  edge_distr_plots <- list()
  n <- n_vec[k]
  samp <- sg.beta.sample_graph(alpha, beta, n)
  counter <- 1
  for (i in 1:xdim) {
    for (j in 1:ydim) {
      hsamp <- hist(samp[i, j,], breaks=hist_breaks, plot=FALSE)
      true <- dbeta(hsamp$mids, alpha[i, j], beta[i, j])
      edge_err[i, j, k] <- mse(hsamp$density, true)
      plot_data <- data.frame(x=hsamp$mids,
                              class=c(rep("yhat", length(hsamp$density)),
                                      rep("y", length(true))),
                              y=c(hsamp$density, true))
      edge_distr_plots[[counter]] <- ggplot(plot_data, aes(x=x, y=y, group=class, color=class)) +
        geom_line() +
        xlab('X') +
        ylab('Density') +
        ggtitle(sprintf('Edge (%d, %d), MSE=%.3f', i, j, edge_err[i, j, k])) +
        theme(text=element_text(size=10))
      counter <- counter + 1
    }
  }
  print(sprintf('Edge-wise distribution plots for %d samples', n))
  multiplot(plotlist=edge_distr_plots, layout=matrix(1:length(edge_distr_plots), nrow=xdim, byrow=TRUE))
}

Estimation of Parameters is Consistent

Here, we will verify that our estimator converges for arbitrary random alpha, beta as our number of samples increases, given that our sampling code is correct (see previous section and test code). We are using the method of moments estimators for alpha and beta, and while the method of moments estimators might not necessarily converge as rapidly to the true parameters as the maximum likelihood estimators, they are still consistent.

Visualization

We begin by plotting the estimated alpha, beta and compare to the true alpha, beta for 50 and 1000 observed samples respectively. Due to the fact that the method of moments produces a consistent estimator, we know that our estimated parameters should be closer to the true values with more samples.

jet.colors <- colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F",
                                 "yellow", "#FF7F00", "red", "#7F0000"))
plot_param <- function(mtx, title='Parameter', lims=c(.5, 5)) {
  meltobj <- melt(mtx)
  colnames(meltobj) <- c('X', 'Y', 'value')
  plt <- ggplot(meltobj, aes(x=X, y=Y, fill=value)) +
    geom_tile() +
    scale_fill_gradientn(colors = jet.colors(7), limits=lims, oob=squish) +
    xlab('X dimension') +
    ylab('Y dimension') +
    ggtitle(title)
  return(plt)
}

alpha <- array(runif(xdim*ydim, min=.5, max=5), dim=c(xdim, ydim))  # true alpha
beta <- array(runif(xdim*ydim, min=.5, max=5), dim=c(xdim, ydim))  # true beta
n_vec <- c(50, 1000)
n_spaces <- length(n_vec)
n_bin <- 15
edge_err <- array(NaN, dim=c(xdim, ydim, n_spaces))  # the error per edge per iteration
err <- array(NaN, dim=c(n_spaces))  # the average error per iteration
hist_breaks <- seq(0, 1, length.out=n_bin)  # beta distr btwn 0 and 1

for (k in 1:length(n_vec)) {
  edge_distr_plots <- list()
  n <- n_vec[k]
  samp <- sg.beta.sample_graph(alpha, beta, n)
  counter <- 1
  est <- sg.beta.graph_estimator(samp)
  params <- list()
  params[[1]] <- plot_param(est$alpha, title=paste('Estimated Alpha, n=', n, sep=""))
  params[[2]] <- plot_param(est$beta, title=paste('Estimated Beta, n=', n, sep=""))
  multiplot(plotlist=params, layout=array(1:length(params), dim=c(1,2)))
}
params <- list()
params[[1]] <- plot_param(alpha, title='True Alpha')
params[[2]] <- plot_param(beta, title='True Beta')
multiplot(plotlist=params, layout=array(1:length(params), dim=c(1,2)))

As we can see, after 1000 samples, our estimate is much closer to the true alpha, beta than at only 50 samples, as we would intuitively expect.

Simulation

For this simulation, we will do as follows:

We know that by convergence, if we sample a large number of values our sample density should approximate to the true density better and better, and this simulation checks that this property is satisfied. Due to consistency, we expect that as n increases we should get closer and closer approximations of alpha, beta. and as such the difference (frobenius norm of the difference) between the estimated and true alpha, beta should decline.

n_vec <- lseq(50, 5000, 20)
n_spaces <- length(n_vec)
edge_err <- array(NaN, dim=c(xdim, ydim, n_spaces))  # the error per edge per iteration
err <- array(NaN, dim=c(n_spaces))  # the average error per iteration
diff <- list(alpha=array(NaN, dim=c(n_spaces)), beta=array(NaN, dim=c(n_spaces)))

for (k in 1:length(n_vec)) {
  n <- n_vec[k]
  samp <- sg.beta.sample_graph(alpha, beta, n)
  for (i in 1:xdim) {
    for (j in 1:ydim) {
      hsamp <- hist(samp[i, j,], breaks=hist_breaks, plot=FALSE)
      edge_err[i, j, k] <- mse(hsamp$density, dbeta(hsamp$mids, alpha[i, j], beta[i, j]))
    }
  }
  est <- sg.beta.graph_estimator(samp)
  diff$alpha[k] <- norm(est$alpha - alpha, 'F')
  diff$beta[k] <- norm(est$beta - beta, 'F')
  err[k] <- mean(edge_err[,,k])
}

base_breaks <- function(n = 10){
    function(x) {
        axisTicks(log10(range(x, na.rm = TRUE)), log = TRUE, n = n)
    }
}

er_results <- data.frame(n=n_vec, error=err)
edge_err_plot <- ggplot(er_results, aes(x=n, y=error)) +
  geom_line(size=3) +
  scale_x_log10(labels=trans_format("log10", math_format(10^.x)), breaks=trans_breaks("log10", function(x) 10^x, n=4)) +
  xlab('Number of Samples') +
  ylab('Average edge-wise MSE') +
  ggtitle('Analalyzing Convergence of distribution of Edge-wise Samples to Density Function') +
  theme(text=element_text(size=12)) +
  annotation_logticks(base=10)

param_results <- data.frame(n=n_vec, alpha=diff$alpha, beta=diff$beta)
param_results <- as.data.frame.array(melt(param_results, id="n", variable.name="parameter"))
param_err_plot <- ggplot(param_results, aes(x=n, y=value, group=parameter, color=parameter)) +
  geom_line(size=3) +
  scale_x_log10(labels=trans_format("log10", math_format(10^.x)), breaks=trans_breaks("log10", function(x) 10^x, n=4)) +
  xlab('Number of Samples') +
  ylab('Norm of Difference') +
  ggtitle('Analalyzing Consistency of Parameters to Truths') +
  theme(text=element_text(size=12)) +
  annotation_logticks(base = 10)

multiplot(edge_err_plot, param_err_plot, cols = 1)

As we can see above, our average MSE drops off exponentially as the number of samples increases. This suggests that our sampling is convergent to the true distribution of parameters. Moreover, looking at the bottom plot, we can see that as we increase our number of samples, of parameters are clearly converging to the true parameters. This indicates that our estimators are consistent.



neurodata/subgraphing documentation built on May 21, 2019, 8:10 a.m.