library(lemur.pack)
library(extraDistr)
library(knitr)
library(kableExtra)
library(proftools)
library(dplyr)
library(magrittr)

Theory

Recall from the discrete introduction:

For a population $\mathbf{Y}$, sample $\mathbf{X}$, and parameter set $\theta$, we can use Bayes' theorem to get the distribution for $\mathbf{Y}$, conditioned on our observed sample, $\mathbf{x}$.

\begin{align} f(\theta, \mathbf{Y} = \mathbf{y}| \mathbf{X} = \mathbf{x}) &= \frac{ f(\mathbf{X} = \mathbf{x} | \theta, \mathbf{Y} = \mathbf{y}) f(\theta, \mathbf{Y} = \mathbf{y}) }{ f(\mathbf{X} = \mathbf{x}) } \ &\propto f(\mathbf{X} = \mathbf{x} | \theta) f(\mathbf{Y} = \mathbf{y} | \theta) \pi(\theta) \end{align}

What needs to change for continuous data? Of course, the distribution $f(\mathbf{X} = \mathbf{x} | \theta)$ now needs to be a continuous distribution. Also, $f(\mathbf{Y} = \mathbf{y} | \theta)$ will change. Actually, it is not immediately clear what this term should be; we will assume it is what it appears it should be, the product of the height of the $Normal$ density at the population values, or summed in log space. In practice, that term ends up being reasonably close to constant, so it could probably be ignored without affecting the resulting populations terribly much.

Normal

Let us first explore using a $Normal$ likelihood; we will set a joint prior on $(\mu, \sigma)$ to be the product of $Normal(\mu|m, s^2)$ and $Inv\text{-}Gamma(\sigma|a, b)$. For this document, we will use $m = 0,\ s = 10000,\ a = .0001,\ b = .0001$

Also, since we are working in a continuous space now, we need to change how we propose. We will now jitter around $y_k$ using a normal distribution with a constant step size. This will ensure symmetry, so we can drop the proposal distribution ratio term from our accept or reject step.

```{.r .fold-show} popsim_normal <- function(obs, N, samples = 1000, step_size = 1, m = 0, s = 10000, a = .0001, b = .0001) { logcorrection <- function(pop, mu, sigma) { sum(dnorm(pop, mean = mu, sd = sigma, log = TRUE)) }

logprior <- function(mu, sigma, m, s, a, b) { dnorm(mu, mean = m, sd = s, log = TRUE) + dinvgamma(sigma, alpha = a, beta = b, log = TRUE) }

logpost <- function(obs, pop, m, s, a, b) { mu <- mean(pop) sigma <- sd(pop)

sum(dnorm(obs, mean = mu, sd = sigma, log = TRUE)) + logprior(mu, sigma, m, s, a, b) + logcorrection(pop, mu, sigma)

}

n <- length(obs)

out <- matrix(NA, nrow = samples + 1, ncol = N) out[1, ] <- c(obs, rnorm(N - n, mean = mean(obs), sd = mad(obs)))

current_lp <- logpost(obs, out[1, ], m, s, a, b)

for(i in 2:(samples + 1)) { current <- out[i - 1, ] for(k in (n + 1):N) { proposal <- current proposal[k] <- rnorm(1, mean = proposal[k], sd = step_size) proposal_lp <- logpost(obs, proposal, m, s, a, b)

  u <- runif(1)
  if(log(u) <= proposal_lp - current_lp) {
    current <- proposal
    current_lp <- proposal_lp
  }
}
out[i, ] <- current

} return(out[-1, ]) }

# Setup

Lets make a sample; suppose we observe these values.


```{.r .fold-show}
obs <- c(100.4, 87.2, 109.6, 90.1, 98.4, 103.2, 112.1, 104.2, 99.6, 96.4)

Now, we will synthesize multiple populations of size $N = 100,\ N = 1000$ and plot the density of $\mu$ over these populations. While doing so, we can profile the function.

popsim_normal_prof_100 <- profileExpr(popsim_normal_test_100 <- popsim_normal(obs, N = 100, samples = comp_iters))
popsim_normal_prof_1000 <- profileExpr(popsim_normal_test_1000 <- popsim_normal(obs, N = 1000, samples = comp_iters))

Density of Mu (Mean of Synthetic Populations), with N = 100 (left) and N = 1000 (right)

Now, we can look at the profile.

Profile for Normal, N = 100 (left) vs. N = 1000 (right)
Total % Total Time Self % Self Time
popsim_normal 93.01 65.18 17.81 12.48
logpost 75.20 52.70 17.24 12.08
sd 29.74 20.84 9.90 6.94
var 19.83 13.90 15.58 10.92
logcorrection 12.61 8.84 12.61 8.84
logprior 11.59 8.12 3.37 2.36
Total % Total Time Self % Self Time
popsim_normal 95.43 1431.14 8.79 131.82
logpost 86.64 1299.32 9.11 136.58
logcorrection 47.97 719.44 47.97 719.44
sd 19.55 293.20 8.75 131.16
var 10.80 162.04 8.46 126.88
logprior 6.82 102.24 2.08 31.20


ctgrubb/lemur.pack documentation built on May 7, 2023, 4:13 a.m.