optIP.wIMSE: Sequential Selection of an Inducing Point Design by...
In liGP: Locally Induced Gaussian Process Regression

Description Usage Arguments Details Value Author(s) References Examples

Optimizes the weighted integrated mean-square error (wIMSE) surface to sequentially select inducing points for a given predictive location and local neighborhood.

optIP.wIMSE(Xn, M, theta = NULL, g = 1e-4, w_mean, w_var = NULL,
            ip_bounds = NULL, integral_bounds = NULL, num_multistart = 15,
            fix_xm1 = TRUE, epsK = sqrt(.Machine$double.eps), epsQ = 1e-5,
            mult = NULL, verbose = TRUE)

`Xn`	a `matrix` of the local neighborhood; `nrow(Xn)=N`
`M`	the positive integer number of inducing points placed for each local neighborhood; `M` should be less than `N`
`theta`	the lengthscale parameter (positive number) in a Gaussian correlation function; a (default) `NULL` value sets the lengthscale at the square of the 10th percentile of pairwise distances between neighborhood points (see `darg` in `laGP` package)
`g`	the nugget parameter (positive number) in the covariance
`w_mean`	a vector of the mean (center) of the Gaussian weight; `length(w_mean)` should equal `ncol(Xn)`
`w_var`	a positive number or vector of positive numbers (length equal to `ncol(Xn)`) denoting the variance(s) in the Gaussian weight. If `NULL` (default), `theta` is used.
`ip_bounds`	a 2 by d `matrix` containing the range of possible values for inducing points; first row contains minimum values for each dimension, second row contains maximum values; if `ip_bounds` is `NULL`, defaults to range of the local neighborhood `Xn`
`integral_bounds`	a 2 by d `matrix` containing the domain bounds for the data; first row contains minimum values for each dimension, second row contains maximum values; if `integral_bounds=NULL`, defaults to range of the local neighborhood `Xn`
`num_multistart`	a positive integer indicating the number of starting locations used to optimize wIMSE for each inducing point
`fix_xm1`	an indicator of whether or not the first inducing point should be fixed at `w_mean` (`TRUE`, default) or optimized (`FALSE`)
`epsK`	a small positive number added to the diagonal of the correlation matrix, of inducing points, K, for numerically stability for inversion
`epsQ`	a small positive number added to the diagonal of the Q `matrix` (see Cole (2021)) for numerically stability for inversion
`mult`	a vector of length `nrow(X)` that contains the number of replicates for each design location in `X`
`verbose`	if `TRUE`, outputs the progress of the number of inducing points optimally placed

The function sequentially places M inducing points around the local neighborhood (Xn). Inducing points are placed based on the minimum in the wIMSE surface integrating over integral_bounds. Hyperparameters theta,g are fixed.

The output is a list with the following components:

`Xm`	a `matrix` of the locally optimal inducing point locations
`wimse`	a vector of the wIMSE progress at each inducing point selection step
`time`	a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

D. Austin Cole austin.cole8@vt.edu

D.A. Cole, R.B. Christianson, and R.B. Gramacy (2021). Locally Induced Gaussian Processes for Large-Scale Simulation Experiments Statistics and Computing, 31(3), 1-21; preprint on arXiv:2008.12857; https://arxiv.org/abs/2008.12857

## a "computer experiment"

## Simple 2-d Herbie's Tooth function used in Cole et al (2020);
## thanks to Lee, Gramacy, Taddy, and others who have used it before

## Build up a design with N=~40K locations
x <- seq(-2, 2, by=0.02)
X <- as.matrix(expand.grid(x, x))
Y <- herbtooth(X)


library(laGP)

## Build a local neighborhood
Xstar <- matrix(c(.4, -1.1), nrow=1)
n <- 100; m <- 10
Xstar_to_X_dists <- distance(Xstar, X)
quant_dists <- quantile(Xstar_to_X_dists, n/nrow(X))
Xn <- X[Xstar_to_X_dists < quant_dists,]
Yn <- Y[Xstar_to_X_dists < quant_dists]

theta <- darg(NULL, Xn)$start
integral_bounds <- rbind(c(-2,-2), c(2,2))

## Optimize inducing point locations
Xm.wimse1 <- optIP.wIMSE(Xn, M=m, Xn=, theta=theta, w_mean=Xstar,
                        integral_bounds=integral_bounds)

## Use a different variance for weight
Xm.wimse2 <- optIP.wIMSE(Xn, M=m, Xn=, theta=theta, w_mean=Xstar,
                        w_var = c(theta/2, 3*theta),
                        integral_bounds=integral_bounds)

## Plot inducing point design and neighborhood
ranges <- apply(Xn, 2, range)
plot(Xn, pch = 16, cex=.5, col='grey',
     xlim=ranges[,1]+c(-.02, .02), ylim=ranges[,2]+c(-.02, .15),
     xlab='x1', ylab = 'x2',
     main='ALC-optimal Inducing Point Design')
points(Xstar[1], Xstar[2], pch=16)
points(Xm.wimse1$Xm, pch=2, col=3, lwd=2)
points(Xm.wimse2$Xm, pch=3, col=4, lwd=2)
legend('topleft', legend=c('Xstar','Neighborhood','Xm with w_var=theta',
                           'Xm with nonisotropic weight'),
       col=c(1,'grey',3,4), pch=c(16,16,2,3), lty=NA, lwd=2, ncol=2)

## View wIMSE progress
plot(1:m, log(Xm.wimse1$wimse), type='l', xlab='inducing point number',
     ylab='log wIMSE',main='wIMSE optimization progress')