knitr::opts_chunk$set( fig.path = paste('figure/', knitr::current_input(), '-figs/', sep=''), fig.width = 7, fig.height = 5 )
library(doit) library(tidyverse) library(mvtnorm) library(viridis) library(lhs)
suppressPackageStartupMessages(library(doit)) suppressPackageStartupMessages(library(tidyverse)) suppressPackageStartupMessages(library(mvtnorm)) suppressPackageStartupMessages(library(viridis)) suppressPackageStartupMessages(library(lhs))
We use the DoIt method to approximate the banana-shaped density used in Joseph (2012):
f = function(theta) { theta = as.matrix(theta) theta = matrix(theta, ncol=2) x = cbind(theta[ , 1, drop=FALSE], theta[ , 2, drop=FALSE] + 0.03 * theta[ , 1, drop=FALSE] ^ 2 - 3) dmvnorm(x, c(0,0), diag(c(100, 1))) } df_true = crossing(theta1 = seq(-30, 30, .5), theta2 = seq(-12,7,.5)) %>% mutate(f = f(cbind(theta1,theta2))) head(df_true)
ggplot(df_true) + geom_raster(aes(x=theta1, y=theta2, fill=f)) + geom_contour(aes(x=theta1, y=theta2, z=f), col='black', lty=2) + scale_fill_viridis()
We evaluate the function at 100 parameter values on a maximin latin hypercube
design. As the approximation region we choose the interval [-27.5, 27.5]
in
the theta1
direction, and [-10,5]
in the theta2
direction.
lim1 = data_frame(min=-27.5, max=27.5, range=max-min) lim2 = data_frame(min=-10, max=5, range=max-min) set.seed(123) design = maximinLHS(n=100, k=2) %>% as_data_frame %>% setNames(c('theta1', 'theta2')) %>% mutate(theta1 = theta1 * lim1$range + lim1$min, theta2 = theta2 * lim2$range + lim2$min, f = f(cbind(theta1, theta2))) head(design)
ggplot(design) + geom_point(aes(x=theta1, y=theta2, colour=f), cex=3) + scale_colour_gradient2()
The DoIt method works by approximating the target density by a weighted sum of
Normal distributions. The optimal variance parameter w
of the Normal
distributions is estimated by the function doit_estimate_w()
:
w = doit_estimate_w(design) print(w)
The function doit_fit()
estimates the DoIt parameters and saves them in an
object of class doit
:
doit = doit_fit(design, w=w) str(doit)
We specify the parameter values at which we want to approximate the target
density, and use doit_approx()
to calculate the approximation:
theta_eval = df_true %>% select(theta1, theta2) df_approx = doit_approx(doit, theta_eval)
plot_df = bind_rows(df_approx %>% rename(f=dens_approx) %>% mutate(type='approx'), df_true %>% mutate(type='true')) ggplot(plot_df, aes(x=theta1, y=theta2)) + geom_raster(aes(fill=f)) + facet_wrap(~type) + geom_point(data=design, pch=1, colour='white') + scale_colour_viridis() + scale_fill_viridis()
The function doit_approx()
also returns the estimation error variance:
ggplot(df_approx, aes(x=theta1, y=theta2)) + geom_raster(aes(fill=variance)) + geom_point(data=design, pch=1, colour='white') + scale_colour_viridis() + scale_fill_viridis()
To add further points to the existing design, we use the function
doit_propose_new()
which chooses a new design point where the current
estimation variance is particularly high:
theta_n = doit_propose_new(doit) print(theta_n)
ggplot(df_approx, aes(x=theta1, y=theta2)) + geom_raster(aes(fill=variance)) + scale_fill_viridis() + geom_point(data=design, pch=1, col='white') + geom_point(data=theta_n, pch=16, cex=3, col='red')
We could use doit_fit()
to estimate the DoIt parameters again on the original
design plus the added point. For more efficiency, we can use the function
doit_update()
. Note that the kernel width w
is not updated. In the
following we add 75 design points, re-optimising the kernel width w
every
10th iteration:
n_new = 75 for (jj in 1:n_new) { theta_n = doit_propose_new(doit) design_add = theta_n %>% mutate(f = f(theta_n)) design = bind_rows(design, design_add) if (jj %% 10 != 0) { doit = doit_update(doit, design_add) } else { w = doit_estimate_w(design, w_0=doit$w) doit = doit_fit(design, w=w) } }
Here we plot the new approximation and highlight the sequentially added new design points:
theta_eval = df_true %>% select(theta1, theta2) df_approx = doit_approx(doit, theta_eval) ggplot(mapping=aes(x=theta1, y=theta2)) + geom_raster(data=df_approx, aes(fill=dens_approx)) + geom_point(data=design %>% mutate(design=rep(c('latin_hypercube','sequential'), c(n()-n_new, n_new))), mapping=aes(pch=design, colour=design), cex=2) + scale_fill_viridis()
The function doit_marginals()
returns a data frame containing approximations
of all marginal distributions:
df_marg = doit_marginals(doit, theta_eval) ggplot(df_marg, aes(x=theta, y=dens_approx)) + geom_point() + geom_line() + facet_wrap(~par, scales='free')
To calculate only the approximate marginal of, say, theta1
, use
doit_marginal()
. If the argument theta_eval
is unspecified, the design
values are used:
df_marg_theta1 = doit_marginal(doit, 'theta1') head(df_marg_theta1)
We use the functions doit_expectation()
and doit_variance()
to approximate
the expectation and covariance matrix of the target density:
doit_expectation(doit)
doit_variance(doit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.