fit_three_layer: MCMC sampling for three layer deep GP

View source: R/fit.R

fit_three_layerR Documentation

MCMC sampling for three layer deep GP

Description

Conducts MCMC sampling of hyperparameters, hidden layer z, and hidden layer w for a three layer deep GP. Separate length scale parameters theta_z, theta_w, and theta_y govern the correlation strength of the inner layer, middle layer, and outer layer respectively. Nugget parameter g governs noise on the outer layer. In Matern covariance, v governs smoothness.

Usage

fit_three_layer(
  x,
  y,
  D = ifelse(is.matrix(x), ncol(x), 1),
  nmcmc = 10000,
  verb = TRUE,
  w_0 = NULL,
  z_0 = NULL,
  g_0 = 0.01,
  theta_y_0 = 0.1,
  theta_w_0 = 0.1,
  theta_z_0 = 0.1,
  true_g = NULL,
  settings = NULL,
  cov = c("matern", "exp2"),
  v = 2.5,
  vecchia = FALSE,
  m = min(25, length(y) - 1)
)

Arguments

x

vector or matrix of input locations

y

vector of response values

D

integer designating dimension of hidden layers, defaults to dimension of x

nmcmc

number of MCMC iterations

verb

logical indicating whether to print iteration progress

w_0

initial value for hidden layer w (must be matrix of dimension nrow(x) by D or dimension nrow(x) - 1 by D). Defaults to the identity mapping.

z_0

initial value for hidden layer z (must be matrix of dimension nrow(x) by D or dimension nrow(x) - 1 by D). Defaults to the identity mapping.

g_0

initial value for g

theta_y_0

initial value for theta_y (length scale of outer layer)

theta_w_0

initial value for theta_w (length scale of middle layer), may be single value or vector of length D

theta_z_0

initial value for theta_z (length scale of inner layer), may be single value or vector of length D

true_g

if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions.

settings

hyperparameters for proposals and priors (see details)

cov

covariance kernel, either Matern or squared exponential ("exp2")

v

Matern smoothness parameter (only used if cov = "matern")

vecchia

logical indicating whether to use Vecchia approximation

m

size of Vecchia conditioning sets (only used if vecchia = TRUE)

Details

Maps inputs x through hidden layer z then hidden layer w to outputs y. Conducts sampling of the hidden layers using Elliptical Slice sampling. Utilizes Metropolis Hastings sampling of the length scale and nugget parameters with proposals and priors controlled by settings. When true_g is set to a specific value, the nugget is not estimated. When vecchia = TRUE, all calculations leverage the Vecchia approximation with specified conditioning set size m. Vecchia approximation is only implemented for cov = "matern".

Proposals for g, theta_y, theta_w, and theta_z follow a uniform sliding window scheme, e.g.

g_star <- runif(1, l * g_t / u, u * g_t / l),

with defaults l = 1 and u = 2 provided in settings. To adjust these, set settings = list(l = new_l, u = new_u). Priors on g, theta_y, theta_w, and theta_z follow Gamma distributions with shape parameters (alpha) and rate parameters (beta) controlled within the settings list object. Defaults are

  • settings$alpha$g <- 1.5

  • settings$beta$g <- 3.9

  • settings$alpha$theta_z <- 1.5

  • settings$beta$theta_z <- 3.9 / 4

  • settings$alpha$theta_w <- 1.5

  • settings$beta$theta_w <- 3.9 / 12

  • settings$alpha$theta_y <- 1.5

  • settings$beta$theta_y <- 3.9 / 6

These priors are designed for x scaled to [0, 1] and y scaled to have mean 0 and variance 1. These may be adjusted using the settings input.

When w_0 = NULL and/or z_0 = NULL, the hidden layers are initialized at x (i.e. the identity mapping). The default prior mean of the hidden layer is zero, but may be adjusted to x using settings = list(w_prior_mean = x, z_prior_mean = x). If w_0 and/or z_0 is of dimension nrow(x) - 1 by D, the final row is predicted using kriging. This is helpful in sequential design when adding a new input location and starting the MCMC at the place where the previous MCMC left off.

The output object of class dgp3 or dgp3vec is designed for use with continue, trim, and predict.

Value

a list of the S3 class dgp3 or dgp3vec with elements:

  • x: copy of input matrix

  • y: copy of response vector

  • nmcmc: number of MCMC iterations

  • settings: copy of proposal/prior settings

  • v: copy of Matern smoothness parameter (v = 999 indicates cov = "exp2")

  • g: vector of MCMC samples for g

  • theta_y: vector of MCMC samples for theta_y (length scale of outer layer)

  • theta_w: matrix of MCMC samples for theta_w (length scale of middle layer)

  • theta_z: matrix of MCMC samples for theta_z (length scale of inner layer)

  • tau2: vector of MLE estimates for tau2 (scale parameter of outer layer)

  • w: list of MCMC samples for middle hidden layer w

  • z: list of MCMC samples for inner hidden layer z

  • time: computation time in seconds

References

Sauer, A, RB Gramacy, and D Higdon. 2020. "Active Learning for Deep Gaussian Process Surrogates." Technometrics, to appear; arXiv:2012.08015.

Sauer, A, A Cooper, and RB Gramacy. 2022. "Vecchia-approximated Deep Gaussian Processes for Computer Experiments." pre-print on arXiv:2204.02904

Murray, I, RP Adams, and D MacKay. 2010. "Elliptical slice sampling." Journal of Machine Learning Research 9, 541-548.

Examples

# Examples of real-world implementations are available at: 
# https://bitbucket.org/gramacylab/deepgp-ex/

# G function (https://www.sfu.ca/~ssurjano/gfunc.html)
f <- function(xx, a = (c(1:length(xx)) - 1) / 2) { 
    new1 <- abs(4 * xx - 2) + a
    new2 <- 1 + a
    prod <- prod(new1 / new2)
    return((prod - 1) / 0.86)
}

# Training data
d <- 2
n <- 30
x <- matrix(runif(n * d), ncol = d)
y <- apply(x, 1, f)

# Testing data
n_test <- 100
xx <- matrix(runif(n_test * d), ncol = d)
yy <- apply(xx, 1, f)

i <- akima::interp(xx[, 1], xx[, 2], yy)
image(i, col = heat.colors(128))
contour(i, add = TRUE)
points(x)

# Example 1: full model (nugget estimated)
fit <- fit_three_layer(x, y, nmcmc = 2000)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)

# Example 2: Vecchia approximated model (nugget fixed)
# (Vecchia approximation is faster for larger data sizes)
fit <- fit_three_layer(x, y, nmcmc = 2000, vecchia = TRUE, 
                       m = 10, true_g = 1e-6)
plot(fit) 
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)



deepgp documentation built on April 8, 2022, 5:08 p.m.