fit_two_layer | R Documentation |
Conducts MCMC sampling of hyperparameters and hidden layer
w
for a two layer deep GP. Separate length scale
parameters theta_w
and theta_y
govern the correlation
strength of the hidden layer and outer layer respectively. Nugget
parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
fit_two_layer( x, y, D = ifelse(is.matrix(x), ncol(x), 1), nmcmc = 10000, verb = TRUE, w_0 = NULL, g_0 = 0.01, theta_y_0 = 0.1, theta_w_0 = 0.1, true_g = NULL, settings = NULL, cov = c("matern", "exp2"), v = 2.5, vecchia = FALSE, m = min(25, length(y) - 1) )
x |
vector or matrix of input locations |
y |
vector of response values |
D |
integer designating dimension of hidden layer, defaults to
dimension of |
nmcmc |
number of MCMC iterations |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
Maps inputs x
through hidden layer w
to outputs
y
. Conducts sampling of the hidden layer using Elliptical
Slice sampling. Utilizes Metropolis Hastings sampling of the length
scale and nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
Proposals for g
, theta_y
, and
theta_w
follow a uniform sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, and theta_w
follow Gamma
distributions with shape parameters (alpha
) and rate parameters
(beta
) controlled within the settings
list object.
Defaults are
settings$alpha$g <- 1.5
settings$beta$g <- 3.9
settings$alpha$theta_w <- 1.5
settings$beta$theta_w <- 3.9 / 4
settings$alpha$theta_y <- 1.5
settings$beta$theta_y <- 3.9 / 6
These priors are designed for x
scaled to
[0, 1] and y
scaled to have mean 0 and variance 1. These may be
adjusted using the settings
input.
When w_0 = NULL
, the hidden layer is initialized at x
(i.e. the identity mapping). The default prior mean of the hidden layer
is zero, but may be adjusted to x
using
settings = list(w_prior_mean = x)
. If w_0
is of dimension
nrow(x) - 1
by D
, the final row is predicted using kriging.
This is helpful in sequential design when adding a new input location
and starting the MCMC at the place where the previous MCMC left off.
The output object of class dgp2
or dgp2vec
is designed for
use with continue
, trim
, and predict
.
a list of the S3 class dgp2
or dgp2vec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta_y
: vector of MCMC samples for theta_y
(length
scale of outer layer)
theta_w
: matrix of MCMC samples for theta_w
(length
scale of inner layer)
tau2
: vector of MLE estimates for tau2
(scale
parameter of outer layer)
w
: list of MCMC samples for hidden layer w
time
: computation time in seconds
Sauer, A, RB Gramacy, and D Higdon. 2020. "Active Learning for Deep Gaussian
Process Surrogates." Technometrics, to appear; arXiv:2012.08015.
Sauer, A, A Cooper, and RB Gramacy. 2022. "Vecchia-approximated Deep Gaussian
Processes for Computer Experiments." pre-print on arXiv:2204.02904
Murray, I, RP Adams, and D MacKay. 2010. "Elliptical slice sampling."
Journal of Machine Learning Research 9, 541-548.
# Examples of real-world implementations are available at: # https://bitbucket.org/gramacylab/deepgp-ex/ # G function (https://www.sfu.ca/~ssurjano/gfunc.html) f <- function(xx, a = (c(1:length(xx)) - 1) / 2) { new1 <- abs(4 * xx - 2) + a new2 <- 1 + a prod <- prod(new1 / new2) return((prod - 1) / 0.86) } # Training data d <- 1 n <- 20 x <- matrix(runif(n * d), ncol = d) y <- apply(x, 1, f) # Testing data n_test <- 100 xx <- matrix(runif(n_test * d), ncol = d) yy <- apply(xx, 1, f) plot(xx[order(xx)], yy[order(xx)], type = "l") points(x, y, col = 2) # Example 1: full model (nugget estimated, using continue) fit <- fit_two_layer(x, y, nmcmc = 1000) plot(fit) fit <- continue(fit, 1000) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit, hidden = TRUE) # Example 2: Vecchia approximated model # (Vecchia approximation is faster for larger data sizes) fit <- fit_two_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit, hidden = TRUE) # Example 3: Vecchia approximated model (re-approximated after burn-in) fit <- fit_two_layer(x, y, nmcmc = 1000, vecchia = TRUE, m = 10) fit <- continue(fit, 1000, re_approx = TRUE) plot(fit) fit <- trim(fit, 1000, 2) fit <- predict(fit, xx, cores = 1) plot(fit, hidden = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.