build_gauss_measure_ipTemplate: Inducing point template design for a Gaussian measure built...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/template.R

Description

Constructs a design of inducing points around a Gaussian measure whose mean is the center of the design matrix and its local neighborhood. The output is an inducing point design centered at the origin that can be used as a template for predictions anywhere in the design space (with a local neighborhood of the same size). The inducing points are sequentially selected by optimizing "wimse", weighted Integrated Mean Squared Error.

Usage

1
2
3
4
5
6
build_gauss_measure_ipTemplate(X = NULL, Y = NULL, M, N, gauss_sd,
                               theta = NULL, g = 1e-4, seq_length=20,
                               ip_bounds = NULL, integral_bounds = NULL,
                               num_multistart = 20,
                               epsK = sqrt(.Machine$double.eps), epsQ = 1e-5,
                               reps = FALSE, verbose = TRUE)

Arguments

X

a matrix containing the full (large) design matrix of input locations. If using a list for reps, this entry is not used

Y

a vector of responses/dependent values with length(Y)=nrow(X). If using a list for reps, this entry is not used

M

a positive integer number of inducing points; M should be less than N

N

the positive integer number of Nearest Neighbor (NN) locations used to build a local neighborhood

gauss_sd

a vector of standard deviations for the Gaussian measure with length(gauss_sd)=nrow(X). Note: at this time, the Gaussian measure must only have one nonzero standard deviation (i.e. the Gaussian measure is a slice)

theta

the lengthscale parameter (positive number) in a Gaussian correlation function; a (default) NULL value sets the lengthscale at the square of the 10th percentile of pairwise distances between neighborhood points (similar to darg in laGP package)

g

the nugget parameter (positive number) in a covariance

seq_length

a positive integer used to build sequences of this length in the nondegenerate dimensions for the purpose of building a local neighbhorhood.

ip_bounds

a 2 by d matrix of the bounds used in the optimization of inducing points; the first row contains minimum values, the second row the maximum values; if not provided, the bounds of the center's local neighborhood are used

integral_bounds

a 2 by d matrix of the bounds used in the calculation of wimse; the first row contains minimum values, the second row the maximum values; only relevant when method="wimse"; if not provided, defaults to the range of each column of X

num_multistart

a scalar positive integer indicating the number of multistart points used to optimize each inducing point

epsK

a small positive number added to the diagonal of the correlation matrix of inducing points for numerically stability for inversion

epsQ

a small positive number added to the diagonal of the Q matrix (see Cole (2021)) for numerically stability for inversion

reps

a notification of replicate design locations in the data set. If TRUE, the unique design locations are used for the calculations along with the average response for each unique design location. Alternatively, reps can be a list from find_reps in the hetGP package. In this case, X and Y are not used.

verbose

when TRUE, prints the current number of inducing points selected during the sequential optimization process

Details

This function is built to deal with the special class of problems where liGP is used to predict and integrate over a degenerate Gaussian measure where only one dimension has a nonzero standard deviation. To build the wIMSE inducing point design, the function optIP.wIMSE is called with the reference point being the median of the design matrix.

For each inducing point design, the first inducing point is placed at the predictive location (i.e. the origin).

Value

The output is a list with the following components.

Xm.t

a matrix of M inducing points centered at the origin

Xn

a matrix of the local neighborhood at the center of the design

Xc

a matrix of the center of the design used to build the local neighborhood and inducing point template

gauss_sd

the gauss_sd used to generate the local neighborhood

time

a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

Author(s)

D. Austin Cole austin.cole8@vt.edu

References

D.A. Cole, R.B. Christianson, and R.B. Gramacy (2021). Locally Induced Gaussian Processes for Large-Scale Simulation Experiments Statistics and Computing, 31(3), 1-21; preprint on arXiv:2008.12857; https://arxiv.org/abs/2008.12857

See Also

optIP.wIMSE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## "2D Toy Problem"
## Herbie's Tooth function used in Cole et al (2020);
## thanks to Lee, Gramacy, Taddy, and others who have used it before

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Y <- herbtooth(X)
X_center <- apply(X, 2, median)
gauss_sd <- c(0, .05)

## build a inducing point template, first with original weighted Integrated Mean-Square Error
int_bounds <- rbind(c(-2,-2), c(2,2))
wimse.out <- build_ipTemplate(X, Y, N=100, M=10, method='wimse',
                              integral_bounds=int_bounds)
Xm.t_wimse <- wimse.out$Xm.t
Xn <- wimse.out$Xn


wimse_gauss.out <- build_gauss_measure_ipTemplate(X, Y, N=100, M=10,
                                                  gauss_sd = gauss_sd,
                                                  integral_bounds=int_bounds)
Xm.t_wimse_gauss <- wimse_gauss.out$Xm.t
Xn_gauss <- wimse_gauss.out$Xn

## plot locally optimized inducing point templates
ylim <- range(Xn_gauss[,2]) + c(-.03, .05)
plot(Xn, pch=16, cex=.5, col='grey',
     xlab = 'x1', ylab = 'x2', ylim = ylim,
     main='Locally optimized IP template based on Gaussian measure')
points(Xn_gauss, cex=.7)
points(X_center[1], X_center[2], pch=16, cex=1.5)
points(Xm.t_wimse, pch=2, lwd=2, col=3)
points(Xm.t_wimse_gauss, pch=6, lwd=2, col=2)
legend('topleft', pch = c(16, 1, 2, 3), col = c('grey', 1, 3, 2),
       legend=c('Local neighborhood (wIMSE)',
                'Local neighborhood (Gauss measure)',
                'wIMSE ip design',
                'Gaussian measure ip design'))

liGP documentation built on July 17, 2021, 9:08 a.m.