build_ipTemplate: Inducing point template design built through sequential...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/template.R

Description

Constructs a design of inducing points around the center of the design matrix and its local neighborhood. The output is an inducing point design centered at the origin that can be used as a template for predictions anywhere in the design space (with a local neighborhood of the same size). Different criteria are available to optimize the inducing points. The methods "wimse" and "alc" use weighted Integrated Mean Squared Error and Active Learning Cohn respectively to sequentially select inducing points.

Usage

1
2
3
4
5
build_ipTemplate(X = NULL, Y = NULL, M, N, theta = NULL, g = 1e-4,
                 method = c('wimse','alc'), ip_bounds = NULL,
                 integral_bounds = NULL, num_thread = 1, num_multistart = 20,
                 w_var = NULL, epsK = sqrt(.Machine$double.eps), epsQ = 1e-5,
                 reps = FALSE, verbose = TRUE)

Arguments

X

a matrix containing the full (large) design matrix of input locations. If using a list for reps, this entry is not used.

Y

a vector of responses/dependent values with length(Y)=nrow(X). If using a list for reps, this entry is not used.

M

a positive integer number of inducing points; M should be less than N

N

a positive integer number of Nearest Neighbor (NN) locations used to build a local neighborhood

theta

the lengthscale parameter (positive number) in a Gaussian correlation function; a (default) NULL value sets the lengthscale at the square of the 10th percentile of pairwise distances between neighborhood points (similar to darg in laGP package)

g

the nugget parameter (positive number) in a covariance

method

specifies the method by which the inducing point template is built. In brief, wIMSE ("wimse", default) minimizes the weighted integrated predictive variance and ALC ("alc") minimizes predictive variance

ip_bounds

a 2 by d matrix of the bounds used in the optimization of inducing points; the first row contains minimum values, the second row the maximum values; if not provided, the bounds of the center's local neighborhood are used

integral_bounds

a 2 by d matrix of the bounds used in the calculation of wimse; the first row contains minimum values, the second row the maximum values; only relevant when method="wimse"; if not provided, defaults to the range of each column of X

num_thread

a scalar positive integer indicating the number of GPUs available for calculating ALC; only relevant when method="alc"

num_multistart

a scalar positive integer indicating the number of multistart points used to optimize each inducing point with wIMSE or ALC

w_var

a scalar positive number used as the variance for the Gaussian weight in wIMSE. If NULL, theta is used.

epsK

a small positive number added to the diagonal of the correlation matrix of inducing points for numerically stability for inversion

epsQ

a small positive number added to the diagonal of the Q matrix (see Cole (2021)) for numerically stability for inversion

reps

a notification of replicate design locations in the data set. If TRUE, the unique design locations are used for the calculations along with the average response for each unique design location. Alternatively, reps can be a list from find_reps in the hetGP package. In this case, X and Y are not used.

verbose

when TRUE, prints the current number of inducing points selected during the sequential optimization process

Details

This function calls separate subroutines for certain methods. For method=wimse, the function optIP.wIMSE is called with the reference point being the median of the design matrix. If method=alc, optIP.ALC is called with the predictive variance being minimized at the median of the design matrix. For any inducing point design, the first inducing point is placed at the predictive location (i.e. the origin).

Value

The output is a list with the following components.

Xm.t

a matrix of M inducing points centered at the origin

Xn

a matrix of the local neighborhood at the center of the design

time

a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

Author(s)

D. Austin Cole austin.cole8@vt.edu

References

D.A. Cole, R.B. Christianson, and R.B. Gramacy (2021). Locally Induced Gaussian Processes for Large-Scale Simulation Experiments Statistics and Computing, 31(3), 1-21; preprint on arXiv:2008.12857; https://arxiv.org/abs/2008.12857

See Also

optIP.wIMSE,optIP.ALC

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
## "1D Toy Problem"
## Test function from Forrester et al (2008);
library(hetGP)
X <- as.matrix(seq(0, 1, length=1000))
Y <- f1d(X)
int_bounds <- matrix(c(0, 1))

## Center of design space used to build inducing point templates
X_center <- median(X)


## Optimize inducing points with weighted Integrated Mean-Square Error
wimse.out <- build_ipTemplate(X, Y, N=100, M=10, method="wimse", integral_bounds=int_bounds)
Xm.t_wimse <- wimse.out$Xm.t

## now optimize inducing points using Active Learning Cohn
alc.out <- build_ipTemplate(X, Y, N=100, M=10, method="alc", integral_bounds=int_bounds)
Xm.t_alc <- alc.out$Xm.t
Xn <- alc.out$Xn ## X_center neighborhood

## plot locally optimized inducing point templates
plot(X, Y, pch=16, cex=.5, col='grey')
points(Xn, f1d(Xn), col=2)
points(Xm.t_wimse + X_center, rep(-4, 10), pch=2, col=3)
points(Xm.t_alc + X_center, rep(-5, 10), pch=3, col=4)
legend('topleft', pch = c(16, 16, 2, 3), col = c('grey', 2, 3, 4),
       legend=c('Data', 'Local neighborhood', 'wIMSE inducing point design',
                'ALC inducing point design'))


## "2D Toy Problem"
## Herbie's Tooth function used in Cole et al (2020);
## thanks to Lee, Gramacy, Taddy, and others who have used it before

## build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Y <- herbtooth(X)
X_center <- apply(X, 2, median)

## build a inducing point template, first with weighted Integrated Mean-Square Error
int_bounds <- rbind(c(-2,-2), c(2,2))
wimse.out <- build_ipTemplate(X, Y, N=100, M=10, method="wimse", integral_bounds=int_bounds)
Xm.t_wimse <- wimse.out$Xm.t

## now optimize inducing points using Active Learning Cohn
alc.out <- build_ipTemplate(X, Y, N=100, M=10, method="alc", integral_bounds=int_bounds)
Xm.t_alc <- alc.out$Xm.t
Xn <- alc.out$Xn

## plot locally optimized inducing point templates
plot(Xn, pch=16, cex=.5, col='grey',
     xlab = 'x1', ylab = 'x2', main='Locally optimized IP templates')
points(X_center[1], X_center[2], pch=16, cex=1.5)
points(Xm.t_wimse, pch=2, lwd=2, col=3)
points(Xm.t_alc, pch =3, lwd=2, col=4)
legend('topleft', pch = c(16, 2, 3), col = c('grey', 3, 4),
       legend=c('Local neighborhood', 'wIMSE inducing point design',
                'ALC inducing point design'))

liGP documentation built on July 17, 2021, 9:08 a.m.

Related to build_ipTemplate in liGP...