knitr::opts_chunk$set( collapse = TRUE, comment = "" ) knitr::opts_chunk$set(fig.width=12, fig.height=8)
So you have this model you (or your employers) really like and that has been extensively tested and validated. A new batch of data arrives for you to analyse and you wonder how to best use the historical data ($D_0$) you have access to. A little bit of searching on the internet suggests you might use a power prior, which consist of raising the likelihood, $L(D_0 \mid \theta)$ to a scalar $a_0$, usually taken to be in $[0, 1]$. Like so $$ p(\theta \mid D_0) = L(D_0 \mid \theta)^{a_0} \pi(\theta), $$ where $\pi(\theta)$ is called the initial prior for the parameter $\theta$.
So you code that up. Only to realise you have no idea what $a_0$ should be. What to do now? A bit more searching returns this helpful review, the section 2.1 of which points you towards a normalised power prior: $$ \tilde{p}(\theta \mid D_0) = \frac{L(D_0 \mid \theta)^{a_0} \pi(\theta)}{\int_{\boldsymbol{\Theta}} L(D_0 \mid t)^{a_0} \pi(t)\,dt} \pi_A(a_0),$$ which hinges on the quantity $$ c(a_0) := \int_{\boldsymbol{\Theta}} L(D_0 \mid t)^{a_0} \pi(t)\,dt.$$ This normalised density can now be used to compute fancy posterior in face of the new data: $$ \tilde{p}(\theta \mid D_0, D) = L(D \mid \theta)L(D_0 \mid \theta)^{a_0} \pi(\theta) \frac{\pi_A(a_0)}{c(a_0)}.$$
In this vignette we will use the npowerPrioR package to implement the routines described in Carvalho & Ibrahim (2021) to reproduce the Bernoulli example (Scenario 1) in Neuenschwander et al. (2009).
The historical data consist of $N_0$ Bernoulli trials $x_{0i} \in {0,1}$. Suppose there were $y_0 = \sum_{i=1}^{N_0}x_{0i}$ successes. The model is $$ \begin{align} \theta &\sim \operatorname{Beta}(c, d), \ x_{0i} \mid \theta &\sim \operatorname{Bernoulli}(\theta). \end{align} $$ This leads to a Beta posterior distribution for $\theta$, $$ \begin{equation} p(\theta \mid N_0, y_0, a_0) \propto \theta ^{a_0 y_0 + c - 1} (1-\theta)^{a_0 (N_0 -y_0) + d - 1}, \end{equation} $$ and hence (Neuenschwander et al., 2009): $$ \begin{equation} c(a_0) = \frac{\mathcal{B}(a_0 y_0 + c, a_0 (N_0 -y_0) + d)}{\mathcal{B}(c, d)}, \end{equation} $$ where $\mathcal{B}(w, z) = \frac{\Gamma(w)\Gamma(z)}{\Gamma(w + z)}$.
So here we will be comparing the approximate power prior to its exact counterpart, since in this simple example we know $c(a_0)$ exactly -- but we'll pretend we don't.
First, let's load up the package
library(npowerPrioR)
Now let's have a look at what a program implementing the vanilla power prior looks like:
vanilla.stan <- system.file("stan", "simple_Bernoulli_prior.stan", package="npowerPrioR")
writeLines(readLines(vanilla.stan))
So, as you can see, just a regular Stan model, with the exception that we have multiplied the likelihood by $a_0$. There are few important things to notice here:
The program must store the log-likelihood (logL
) and its square (logL_sq
) in the transformed parameters
block. This is important for the routines in npowerPrioR;
All of the component densities need to be written out in full, i.e. we always write
target += normal_lpdf(x | 0, 1);
instead of
x ~ normal(0, 1);
Now we can finally run our routine to estimate $c(a_0)$ using the bisection-type algorithm described in Carvalho & Ibrahim (2021).
N_0 <- 100 y_0 <- 20 N <- 100 y <- 20 cc <- 1 dd <- 1 nu <- 1 eta <- 1 prior <- suppressMessages(stan_model(vanilla.stan)) bb.data <- list( N0 = N_0, y0 = y_0, c = cc, d = dd, a_0 = NA ) ##################### rstan_options(auto_write = TRUE) options(mc.cores = 4) epsilon <- 0.05 J <- 20 maxA <- 1
adaptive.ca0.estimates <- build_grid(compiled.model.prior = prior, eps = epsilon, M = maxA, J = J, v1 = 10, v2 = 10, stan.list = bb.data, pars = "theta") warnings()
Let's look at what the product of our labour is:
head(adaptive.ca0.estimates$result)
Nice! So we have measured $\log(c(a_0))$ at a few points. Now we we'll fit a generalised additive model (GAM) to emulate $\log(c(a_0))$ at any point we want:
fit.gam <- mgcv::gam(lc_a0 ~ s(a0, k = J + 1), data = adaptive.ca0.estimates$result)
Now we'll produce predictions from the GAM model to create a fine dictionary of pairs $a_0, \log(c(a_0))$. We'll pick grid size of $K=20, 000$.
K <- 2e4 bb.data.forposterior <- list( N0 = N_0, y0 = y_0, c = cc, d = dd, nu = nu, eta = eta, N = N, y = y, K = K ) pred_a0s <- seq(0, max(adaptive.ca0.estimates$result$a0), length.out = K) a0_grid <- data.frame(a0 = pred_a0s, lc_pred = predict(fit.gam, newdata = data.frame(a0 = pred_a0s))) bb.data.forposterior$pred_grid_x <- a0_grid$a0 bb.data.forposterior$pred_grid_y <- a0_grid$lc_pred
These can be plugged in a modified program. Let us investigate its structure:
approximate.stan <- system.file("stan", "simple_Bernoulli_posterior_normalised_approximate.stan", package="npowerPrioR")
writeLines(readLines(approximate.stan))
As you can see, this modified program has a few key components:
The functions
block now includes functions approximately compute $\log(c(a_0))$ from a dictionary. Note that these functions are model-agnostic. You can just copy-paste them into a program for your model and they should work.
The model
block now has a line with -approximate_ca0
in it (take note of the sign, it's important)
The line dubbed "Likelihood" in the program implements the likelihood of the current data $L(D\mid \theta)$.
Now, let's compile
approx.normalised.model <- suppressMessages(stan_model(approximate.stan))
and run our model
approx.norm.posterior.bern <- sampling(approx.normalised.model, data = bb.data.forposterior, refresh = 500, iter = 4000)
OK, we'll annotate those results shortly.
To finish this analysis off, we will run both the unnormalised (mathematically wrong!, see Neuenschwander et al., 2009) version of the power prior and the exactly normalised model, because in this situation we're blessed enough to know the correct $c(a_0)$ in closed-form.
unnormalised.stan <- system.file("stan", "simple_Bernoulli_posterior_unnormalised.stan", package="npowerPrioR")
unnorm.bern <- suppressMessages(stan_model(unnormalised.stan))
unnorm.posterior.bern <- sampling(unnorm.bern, data = bb.data.forposterior, refresh = 500, iter = 4000)
And now, the exactly normalised posterior:
exactly.normalised.stan <- system.file("stan", "simple_Bernoulli_posterior_normalised.stan", package="npowerPrioR")
exactly.normalised.model <- suppressMessages(stan_model(exactly.normalised.stan))
norm.posterior.bern <- sampling(exactly.normalised.model, data = bb.data.forposterior, refresh = 500, iter = 4000)
Good. Now let's compare all of the estimates we have computed. First, a little bit of prep
# Eq 8 in Neuenschwander et al. 2009 posterior_a0_Bernoulli <- function(a_0, y0, n0, y, n, cc, dd, eta, nu, log = FALSE){ term1 <- lgamma(a_0 * n0 + cc + dd) + lgamma(a_0 * y0 + y + cc) + lgamma( a_0 *(n0-y0) + (n - y) + dd) term2 <- lgamma(a_0 * y_0 + cc) + lgamma(a_0 * (n0 - y0) + dd ) + lgamma(a_0 * n0 + n + cc + dd) term3 <- dbeta(a_0, shape1 = eta, shape2 = nu, log = TRUE) ans <- term1 - term2 + term3 if(!log) ans <- exp(ans) return(ans) } post_a0 <- function(x) { posterior_a0_Bernoulli(a_0 = x, y0 = y_0, n0 = N_0, y = y, n = N, cc = cc, dd = dd,eta = eta, nu = nu) } post_a0 <- Vectorize(post_a0) Kp <- integrate(post_a0, 0, 1)$value norm_post_a0 <- function(x) post_a0(x)/Kp norm_post_a0 <- Vectorize(norm_post_a0)
Let's combine the posterior estimates for $a_0$ under the various models:
a0.unnorm <- extract(unnorm.posterior.bern, 'a_0')$a_0 a0.approx <- extract(approx.norm.posterior.bern, 'a_0')$a_0 a0.norm <- extract(norm.posterior.bern, 'a_0')$a_0 a0.dt <- data.frame(a0 = c(a0.unnorm, a0.norm, a0.approx), normalisation = c( rep("none", length(a0.unnorm)), rep("exact", length(a0.norm)), rep("approximate", length(a0.approx)) ))
Now, a nice plot:
library(ggplot2) a0_dist <- ggplot(a0.dt, aes(x = a0, fill = normalisation, colour = normalisation)) + geom_density() + stat_function(fun = function(x) dbeta(x, eta, nu), geom = "line", colour = "black", linetype = "longdash") + stat_function(fun = norm_post_a0, geom = "line", colour = "black", linetype = "solid") + # ggtitle("Bernoulli") + facet_grid(normalisation~., scales = "free") + scale_y_continuous("Density", expand = c(0, 0)) + scale_x_continuous(expression(a[0]), expand = c(0, 0)) + theme_bw(base_size = 20) + theme(legend.position = "none") + theme(legend.position = "bottom", legend.justification = "centre", legend.title = element_blank(), strip.background = element_blank(), strip.text.y = element_blank(), legend.margin = margin(0, 0, 0, 0), legend.box.margin = margin(0, 0, 0, 0)) a0_dist
Now let's look at the estimates of the success probability, $\theta$.
unnorm.theta.dt <- data.frame(theta = extract(unnorm.posterior.bern, 'theta')$theta) unnorm.theta.dt$normalisation <- "none" approx.theta.dt <- data.frame(theta = extract(approx.norm.posterior.bern, 'theta')$theta) approx.theta.dt$normalisation <- "approximate" norm.theta.dt <- data.frame(theta = extract(norm.posterior.bern, 'theta')$theta) norm.theta.dt$normalisation <- "exact" par.posteriors <- rbind(unnorm.theta.dt, approx.theta.dt, norm.theta.dt) a0_star <- 0.05 a_star <- a0_star*bb.data.forposterior$y0 + bb.data.forposterior$c + bb.data.forposterior$y b_star <- a0_star*(bb.data.forposterior$N0 - bb.data.forposterior$y0) + bb.data.forposterior$d + (bb.data.forposterior$N - bb.data.forposterior$y) theta_posterior <- ggplot(data = par.posteriors, aes(x = theta, colour = normalisation, fill = normalisation)) + geom_density(alpha = .4) + stat_function(fun = function(x) dbeta(x, a_star, b_star), geom = "line", colour = "black", linetype = "solid") + # geom_vline(xintercept = y/N, linetype = "dashed") + scale_x_continuous(expression(theta), expand = c(0, 0)) + scale_y_continuous("Density", expand = c(0, 0)) + theme_bw(base_size = 20) + theme(legend.position = "bottom", legend.justification = "centre", legend.title = element_blank(), strip.background = element_blank(), strip.text.y = element_blank(), legend.margin = margin(0, 0, 0, 0), legend.box.margin = margin(0, 0, 0, 0)) theta_posterior
In this vignette we have shown how to write a Stan program to do a fixed $a_0$ power prior analysis for use within npowerPrioR and how to modify that program to include an approximate emulation function for the log-normalising constant, $\log(c(a_0))$.
We have also compared the approximately normalised power prior to the exactly normalised distribution in a situation where we know the true answer, and found that it gave very good results, both in terms of the marginal posterior for $a_0$ and the marginal posterior of the parameter of interest, $\theta$.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.