fit_two_layer | R Documentation |
Conducts MCMC sampling of hyperparameters and hidden layer
w
for a two layer deep GP. Separate length scale
parameters theta_w
and theta_y
govern the correlation
strength of the hidden layer and outer layer respectively. Nugget
parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
fit_two_layer(
x,
y,
nmcmc = 10000,
D = ifelse(is.matrix(x), ncol(x), 1),
monowarp = FALSE,
pmx = FALSE,
verb = TRUE,
w_0 = NULL,
g_0 = 0.001,
theta_y_0 = 0.1,
theta_w_0 = 0.1,
true_g = NULL,
settings = NULL,
cov = c("matern", "exp2"),
v = 2.5,
vecchia = FALSE,
m = min(25, length(y) - 1),
ordering = NULL
)
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
D |
integer designating dimension of hidden layer, defaults to
dimension of |
monowarp |
indicates whether warpings should be forced to be
monotonic. Input may be a matrix of
grid points (or a vector which will be applied to every dimension)
for interpolation of the cumulative sum, an integer
specifying the number of grid points to use over the range [0, 1],
or simply the boolean |
pmx |
"prior mean x", logical indicating whether |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
Maps inputs x
through hidden layer w
to outputs
y
. Conducts sampling of the hidden layer using Elliptical
Slice sampling. Utilizes Metropolis Hastings sampling of the length
scale and nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
When monowarp = TRUE
, each input dimension is warped separately and
monotonically. This requires D = ncol(x)
. Monotonic warpings trigger
separable lengthscales on the outer layer (theta_y
). As a default, monotonic
warpings use the reference grid: seq(0, 1, length = 50)
. The grid size
may be controlled by passing a numeric integer to monowarp
(i.e., monowarp = 100
uses the grid seq(0, 1, length = 100)
).
Alternatively, any user-specified grid may be passed as the argument to
monowarp
.
When pmx = TRUE
, the prior on the latent layer is set at x
(rather than the default of zero). This requires D = ncol(x)
. If
pmx
is set to a numeric value, then that value is used as the scale
parameter on the latent layer. Specifying a small value here encourages
an identity mapping.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
, theta_y
, and
theta_w
follow a uniform sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, and theta_w
follow Gamma
distributions with shape parameters (alpha
) and rate parameters
(beta
) controlled within the settings
list object.
Defaults have been updated with package version 1.1.3. Default priors differ
for noisy/deterministic settings and depend on whether monowarp = TRUE
.
All default values are visible in the internal
deepgp:::check_settings
function.
These priors are designed for x
scaled to
[0, 1] and y
scaled to have mean 0 and variance 1. These may be
adjusted using the settings
input.
When w_0 = NULL
, the hidden layer is initialized at x
(i.e. the identity mapping). If w_0
is of dimension
nrow(x) - 1
by D
, the final row is predicted using kriging.
This is helpful in sequential design when adding a new input location
and starting the MCMC at the place where the previous MCMC left off.
The output object of class dgp2
or dgp2vec
is designed for
use with continue
, trim
, and predict
.
a list of the S3 class dgp2
or dgp2vec
with elements:
x
: copy of input matrix
y
: copy of response vector
nmcmc
: number of MCMC iterations
settings
: copy of proposal/prior settings
v
: copy of Matern smoothness parameter (v = 999
indicates cov = "exp2"
)
g
: vector of MCMC samples for g
theta_y
: vector of MCMC samples for theta_y
(length
scale of outer layer)
theta_w
: matrix of MCMC samples for theta_w
(length
scale of inner layer)
tau2
: vector of MLE estimates for tau2
(scale
parameter of outer layer)
w
: list of MCMC samples for hidden layer w
ll
: vector of MVN log likelihood of the outer layer
for reach Gibbs iteration
time
: computation time in seconds
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic
warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# Booth function (inspired by the Higdon function)
f <- function(x) {
i <- which(x <= 0.58)
x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12)
x[-i] <- 5 * x[-i] - 4.9
return(x)
}
# Training data
x <- seq(0, 1, length = 25)
y <- f(x)
# Testing data
xx <- seq(0, 1, length = 200)
yy <- f(xx)
plot(xx, yy, type = "l")
points(x, y, col = 2)
# Example 1: full model (nugget estimated, using continue)
fit <- fit_two_layer(x, y, nmcmc = 1000)
plot(fit)
fit <- continue(fit, 1000)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 2: Vecchia approximated model (nugget estimated)
# (Vecchia approximation is faster for larger data sizes)
fit <- fit_two_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 3: Vecchia approximated model, re-approximated after burn-in
fit <- fit_two_layer(x, y, nmcmc = 1000, vecchia = TRUE, m = 10)
fit <- continue(fit, 1000, re_approx = TRUE)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 4: full model with monotonic warpings (nugget estimated)
fit <- fit_two_layer(x, y, nmcmc = 2000, monowarp = TRUE)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.