loiGP: Locally Optimized Inducing Point Approximate GP Regression...

Description Usage Arguments Details Value Author(s) References Examples

View source: R/liGP.R

Description

Facilitates localized Gaussian process inference and prediction at a large set of predictive locations, by opimizing a local set of inducing points for each predictive location's local neighborhood and then calling giGP.

Usage

1
2
3
loiGP(XX, X = NULL, Y = NULL, M, N, g = 1e-6, theta = NULL, nu = NULL,
      method = c('wimse','alc'), integral_bounds = NULL, num_thread = 1,
      epsK = sqrt(.Machine$double.eps), epsQ = 1e-5, tol = .01, reps = FALSE)

Arguments

XX

a matrix of out-of-sample predictive locations with ncol(XX) = ncol(X); loiGP calls giGP for each row of XX, independently

X

a matrix containing the full (large) design matrix of all input locations. If reps is a list, this entry is not used.

Y

a vector of all responses/dependent values with length(Y)=nrow(X). If reps is a list, this entry is not used.

M

the positive integer number of inducing points placed for each local neighborhood; M should be less than N

N

the positive integer number of Nearest Neighbor (NN) locations used to build a local neighborhood

g

an initial setting or fixed value for the nugget parameter. In order to optimize g, a list can be provided that includes:

  • start – starting value to initialize the nugget

  • min – minimum value in the allowable range for the nugget

  • max – maximum value in the allowable range for the nugget

  • ab – shape and rate parameters specifying a Gamma prior for the nugget

If ab is not provided, a prior is not placed with the likelihood for optimization. If min and max aren't provided, the nugget is not optimized. A NULL value generates an initial setting based on garg in the laGP package. If a single positive scalar is provided, the nugget is fixed for all predictions. Alternatively, a vector of nuggets whose length equals nrow(XX) can be provided to fix distinct nuggets for each prediction.

theta

an initial setting or fixed value for the lengthscale parameter. A (default) NULL value generates an initial setting based on darg in the laGP package. Similarly, a list can be provided that includes:

  • start – starting value to initialize the lengthscale

  • min – minimum value in the allowable range for the lengthscale

  • max – maximum value in the allowable range for the lengthscale

  • ab – shape and rate parameters specifying a Gamma prior for the lengthscale

If ab is not provided, a prior is not placed with the likelihood for optimization. If min and max aren't provided, the lengthscale is not optimized. If a single positive scalar is provided, the lengthscale is fixed for all predictions. Alternatively, a vector of lengthscales whose length equals nrow(XX) can be provided to fix distinct lengthscales for each prediction.

nu

a positive number used to set the scale parameter; default (NULL) calculates the maximum likelihood estimator

method

specifies the method by which the inducing point template is built. In brief, wIMSE ("wimse", default) minimizes the weighted integrated mean-sqaure error, and ALC ("alc") minimizes predictive variance at the preditive location.

integral_bounds

a 2 by d matrix of the domain bounds of the data (used in the calculation of wimse); the first row contains minimum values, the second row the maximum values; only relevant when method="wimse"; if not provided, defaults to the range of each column of X

num_thread

a scalar positive integer indicating the number of threads to use for parallel processing

epsK

a small positive number added to the diagonal of the correlation matrix, of inducing points, K, for numerically stability for inversion. It is automatically increased if neccessary for each prediction.

epsQ

a small positive number added to the diagonal of the Q matrix (see Cole (2021)) for numerically stability for inversion. It is automatically increased if neccessary for each prediction.

tol

a positive number to serve as the tolerance level for covergence of the log-likelihood when optimizing the hyperparameter(s) theta and/or g

reps

a notification of replicate design locations in the data set. If TRUE, the unique design locations are used for the calculations along with the average response for each unique design location. Alternatively, reps can be a list from find_reps in the hetGP package. In this case, X and Y are not used.

Details

This function builds a unique inducing point design to accompany the local neighborhood for each preditive location in XX. It then invokes giGP for each row of XX with X=Xn, Y=Yn from the corresponding local neighborhood and locally optimial inducing point design. For further information, see giGP.

Value

The output is a list with the following components:

mean

a vector of predictive means of length nrow(XX)

var

a vector of predictive variances of length nrow(XX)

nu

a vector of values of the scale parameter of length nrow(XX)

g

a full version of the g argument

theta

a full version of the theta argument

Xm

a list of inducing point designs; each entry in the list is a matrix containing M locally optimized inducing points; length(Xm)=nrow(XX)

eps

a matrix of epsK and epsQ (jitter) values used for each prediction, nrow(eps)=nrow(XX)

mle

if g and/or theta is optimized, a matrix containing the values found for these parameters and the number of required iterations, for each predictive location in XX

time

a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

Author(s)

D. Austin Cole austin.cole8@vt.edu

References

D.A. Cole, R.B. Christianson, and R.B. Gramacy (2021). Locally Induced Gaussian Processes for Large-Scale Simulation Experiments Statistics and Computing, 31(3), 1-21; preprint on arXiv:2008.12857; https://arxiv.org/abs/2008.12857

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(hetGP); library(lhs)
X <- matrix(seq(0, 1, length=1000))
Y <- f1d(X)
XX <- matrix(seq(.01, .99, length=50))
YY <- f1d(XX)


n <- 50
m <- 7
int_bounds <- matrix(c(0,1))

out <- loiGP(XX=XX, X=X, Y=Y, M=m, N=n, method='wimse',
             integral_bounds=int_bounds)

## Plot predicted mean and error
orig_par <- par()
par(mfrow=c(1,2))
plot(X, Y, type='l', lwd=4, ylim=c(-8, 16))
lines(XX, out$mean, lwd=3, lty=2, col=2)
legend('topleft', legend=c('Test Function', 'Predicted mean'),
       lty=1:2, col=1:2, lwd=2)

plot(XX, YY - out$mean, xlab='X', ylab='Error', type = 'l')
par(orig_par)

liGP documentation built on July 17, 2021, 9:08 a.m.

Related to loiGP in liGP...