fit_three_layer | R Documentation |
Conducts MCMC sampling of hyperparameters, hidden layer
z
, and hidden layer w
for a three layer deep GP.
Separate length scale parameters theta_z
,
theta_w
, and theta_y
govern the correlation
strength of the inner layer, middle layer, and outer layer respectively.
Nugget parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
fit_three_layer( x, y, D = ifelse(is.matrix(x), ncol(x), 1), nmcmc = 10000, verb = TRUE, w_0 = NULL, z_0 = NULL, g_0 = 0.01, theta_y_0 = 0.1, theta_w_0 = 0.1, theta_z_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1) )
x |
vector or matrix of input locations |
y |
vector of response values |
D |
integer designating dimension of hidden layers, defaults to
dimension of |
nmcmc |
number of MCMC iterations |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
z_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
theta_z_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
Maps inputs x
through hidden layer z
then hidden
layer w
to outputs y
. Conducts sampling of the hidden
layers using Elliptical Slice sampling. Utilizes Metropolis Hastings
sampling of the length scale and nugget parameters with proposals and
priors controlled by settings
. When true_g
is set to a
specific value, the nugget is not estimated. When
vecchia = TRUE
, all calculations leverage the Vecchia
approximation with specified conditioning set size m
. Vecchia
approximation is only implemented for cov = "matern"
.
Proposals for g
,
theta_y
, theta_w
, and theta_z
follow a uniform
sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, theta_w
, and theta_z
follow Gamma distributions with shape parameters (alpha
) and rate
parameters (beta
) controlled within the settings
list
object. Defaults are
settings$alpha$g <- 1.5
settings$beta$g <- 3.9
settings$alpha$theta_z <- 1.5
settings$beta$theta_z <- 3.9 / 4
settings$alpha$theta_w <- 1.5
settings$beta$theta_w <- 3.9 / 12
settings$alpha$theta_y <- 1.5
settings$beta$theta_y <- 3.9 / 6
These priors are designed for x
scaled to [0, 1] and y
scaled to have mean 0 and variance 1. These may be adjusted using the
settings
input.
When w_0 = NULL
and/or z_0 = NULL
, the hidden layers are
initialized at x
(i.e. the identity mapping). The default prior
mean of the hidden layer is zero, but may be adjusted to x
using
settings = list(w_prior_mean = x, z_prior_mean = x)
.
If w_0
and/or z_0
is of dimension nrow(x) - 1
by
D
, the final row is predicted using kriging. This is helpful in
sequential design when adding a new input location and starting the MCMC
at the place where the previous MCMC left off.
The output object of class dgp3
or dgp3vec
is designed for
use with continue
, trim
, and predict
.
a list of the S3 class dgp3
or dgp3vec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta_y
: vector of MCMC samples for theta_y
(length
scale of outer layer)
theta_w
: matrix of MCMC samples for theta_w
(length
scale of middle layer)
theta_z
: matrix of MCMC samples for theta_z
(length
scale of inner layer)
tau2
: vector of MLE estimates for tau2
(scale
parameter of outer layer)
w
: list of MCMC samples for middle hidden layer w
z
: list of MCMC samples for inner hidden layer z
time
: computation time in seconds
Sauer, A, RB Gramacy, and D Higdon. 2020. "Active Learning for Deep Gaussian
Process Surrogates." Technometrics, to appear; arXiv:2012.08015.
Sauer, A, A Cooper, and RB Gramacy. 2022. "Vecchia-approximated Deep Gaussian
Processes for Computer Experiments." pre-print on arXiv:2204.02904
Murray, I, RP Adams, and D MacKay. 2010. "Elliptical slice sampling."
Journal of Machine Learning Research 9, 541-548.
# Examples of real-world implementations are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # G function (https://www.sfu.ca/~ssurjano/gfunc.html) f <- function(xx, a = (c(1:length(xx)) - 1) / 2) { new1 <- abs(4 * xx - 2) + a new2 <- 1 + a prod <- prod(new1 / new2) return((prod - 1) / 0.86) } # Training data d <- 2 n <- 30 x <- matrix(runif(n * d), ncol = d) y <- apply(x, 1, f) # Testing data n_test <- 100 xx <- matrix(runif(n_test * d), ncol = d) yy <- apply(xx, 1, f) i <- akima::interp(xx[, 1], xx[, 2], yy) image(i, col = heat.colors(128)) contour(i, add = TRUE) points(x) # Example 1: full model (nugget estimated) fit <- fit_three_layer(x, y, nmcmc = 2000) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit) # Example 2: Vecchia approximated model (nugget fixed) # (Vecchia approximation is faster for larger data sizes) fit <- fit_three_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10, true_g = 1e-6) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.