knitr::opts_chunk$set(
  fig.path = paste('figure/', knitr::current_input(), '-figs/', sep=''),
  fig.width = 7,
  fig.height = 5
)
library(doit)
library(tidyverse)
library(mvtnorm)
library(viridis)
library(lhs)
suppressPackageStartupMessages(library(doit))
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(mvtnorm))
suppressPackageStartupMessages(library(viridis))
suppressPackageStartupMessages(library(lhs))

The target function

We use the DoIt method to approximate the banana-shaped density used in Joseph (2012):

f = function(theta) {
  theta = as.matrix(theta)
  theta = matrix(theta, ncol=2) 
  x = cbind(theta[ , 1, drop=FALSE], 
            theta[ , 2, drop=FALSE] + 0.03 * theta[ , 1, drop=FALSE] ^ 2 - 3)
  dmvnorm(x, c(0,0), diag(c(100, 1)))
}

df_true = crossing(theta1 = seq(-30, 30, .5), theta2 = seq(-12,7,.5)) %>%
  mutate(f = f(cbind(theta1,theta2)))

head(df_true)
ggplot(df_true) + 
  geom_raster(aes(x=theta1, y=theta2, fill=f)) + 
  geom_contour(aes(x=theta1, y=theta2, z=f), col='black', lty=2) + 
  scale_fill_viridis()

Latin hypercube design

We evaluate the function at 100 parameter values on a maximin latin hypercube design. As the approximation region we choose the interval [-27.5, 27.5] in the theta1 direction, and [-10,5] in the theta2 direction.

lim1 = data_frame(min=-27.5, max=27.5, range=max-min)
lim2 = data_frame(min=-10,   max=5,    range=max-min)
set.seed(123)
design = maximinLHS(n=100, k=2) %>% 
         as_data_frame %>% 
         setNames(c('theta1', 'theta2')) %>%
         mutate(theta1 = theta1 * lim1$range + lim1$min, 
                theta2 = theta2 * lim2$range + lim2$min,
                f      = f(cbind(theta1, theta2)))
head(design)
ggplot(design) + 
  geom_point(aes(x=theta1, y=theta2, colour=f), cex=3) + 
  scale_colour_gradient2()

Estimate the optimal kernel width

The DoIt method works by approximating the target density by a weighted sum of Normal distributions. The optimal variance parameter w of the Normal distributions is estimated by the function doit_estimate_w():

w = doit_estimate_w(design)
print(w)

Evaluate the DoIt approximation

The function doit_fit() estimates the DoIt parameters and saves them in an object of class doit:

doit = doit_fit(design, w=w)
str(doit)

We specify the parameter values at which we want to approximate the target density, and use doit_approx() to calculate the approximation:

theta_eval = df_true %>% select(theta1, theta2)
df_approx = doit_approx(doit, theta_eval)
plot_df = bind_rows(df_approx %>% rename(f=dens_approx) %>% 
                                  mutate(type='approx'),
                    df_true %>% mutate(type='true'))
ggplot(plot_df, aes(x=theta1, y=theta2)) + 
  geom_raster(aes(fill=f)) + 
  facet_wrap(~type) +
  geom_point(data=design, pch=1, colour='white') + 
  scale_colour_viridis() + scale_fill_viridis()

The function doit_approx() also returns the estimation error variance:

ggplot(df_approx, aes(x=theta1, y=theta2)) + 
  geom_raster(aes(fill=variance)) + 
  geom_point(data=design, pch=1, colour='white') + 
  scale_colour_viridis() + scale_fill_viridis()

Add more design points

To add further points to the existing design, we use the function doit_propose_new() which chooses a new design point where the current estimation variance is particularly high:

theta_n = doit_propose_new(doit)
print(theta_n)
ggplot(df_approx, aes(x=theta1, y=theta2)) + 
  geom_raster(aes(fill=variance)) + 
  scale_fill_viridis() +
  geom_point(data=design, pch=1, col='white') +
  geom_point(data=theta_n, pch=16, cex=3, col='red')

We could use doit_fit() to estimate the DoIt parameters again on the original design plus the added point. For more efficiency, we can use the function doit_update(). Note that the kernel width w is not updated. In the following we add 75 design points, re-optimising the kernel width w every 10th iteration:

n_new = 75
for (jj in 1:n_new) {
  theta_n    = doit_propose_new(doit)
  design_add = theta_n %>% mutate(f = f(theta_n))
  design = bind_rows(design, design_add)
  if (jj %% 10 != 0) {
    doit = doit_update(doit, design_add)
  } else { 
    w = doit_estimate_w(design, w_0=doit$w)
    doit = doit_fit(design, w=w)
  }
}

Here we plot the new approximation and highlight the sequentially added new design points:

theta_eval = df_true %>% select(theta1, theta2)
df_approx = doit_approx(doit, theta_eval)
ggplot(mapping=aes(x=theta1, y=theta2)) +
  geom_raster(data=df_approx, aes(fill=dens_approx)) + 
  geom_point(data=design %>% 
                  mutate(design=rep(c('latin_hypercube','sequential'), c(n()-n_new, n_new))),
             mapping=aes(pch=design, colour=design), cex=2) + 
  scale_fill_viridis()

Marginal densities

The function doit_marginals() returns a data frame containing approximations of all marginal distributions:

df_marg = doit_marginals(doit, theta_eval)
ggplot(df_marg, aes(x=theta, y=dens_approx)) + 
  geom_point() + geom_line() + 
  facet_wrap(~par, scales='free')

To calculate only the approximate marginal of, say, theta1, use doit_marginal(). If the argument theta_eval is unspecified, the design values are used:

df_marg_theta1 = doit_marginal(doit, 'theta1')
head(df_marg_theta1)

Expectations and variances

We use the functions doit_expectation() and doit_variance() to approximate the expectation and covariance matrix of the target density:

doit_expectation(doit)
doit_variance(doit)


sieste/doit documentation built on May 9, 2019, 4:10 p.m.