liGP.forloop: Localized Inducing Point Approximate GP Regression For Many...
In liGP: Locally Induced Gaussian Process Regression

Description Usage Arguments Value Note Author(s) References See Also Examples

Facilitates locally induced Gaussian process inference and prediction at a large set of predictive locations by: building local neighborhoods, shifting an inducing point template, optimizing hyperparameters, and calculating GP predictive equations.

1
2
3

liGP.forloop(XX, X = NULL, Y = NULL, Xm.t, N, g = 1e-6, theta = NULL,
     nu = NULL, epsK = sqrt(.Machine$double.eps), epsQ = 1e-5,
     tol = .01, reps = FALSE, Xni.return = FALSE)

`XX`	a `matrix` of out-of-sample predictive locations with `ncol(XX) = ncol(X)`
`X`	a `matrix` containing the full (large) design matrix of all input locations. If `reps` is a list, this entry is not used.
`Y`	a vector of all responses/dependent values with `length(Y)=nrow(X)`. If `reps` is a list, this entry is not used.
`Xm.t`	a `matrix` containing the `M` inducing points template with `ncol(Xm.t) = ncol(X)`. See 'Note' for more.
`N`	the positive integer number of nearest neighbor (NN) locations used to build a local neighborhood; `N` should be greater than `M`. See 'Note' for more.
`g`	an initial setting or fixed value for the nugget parameter. In order to optimize g, a list can be provided that includes: `start` – starting value to initialize the nugget `min` – minimum value in the allowable range for the nugget `max` – maximum value in the allowable range for the nugget `ab` – shape and rate parameters specifying a Gamma prior for the nugget If `ab` is not provided, a prior is not placed with the likelihood for optimization. If `min` and `max` aren't provided, the nugget is not optimized. A `NULL` value generates a list based on `garg` in the laGP package. If a single positive scalar is provided, the nugget is fixed for all predictions. Alternatively, a vector of nuggets whose length equals `nrow(XX)` can be provided to fix distinct nuggets for each prediction.
`theta`	an initial setting or fixed value for the lengthscale parameter. A (default) `NULL` value generates an initial setting based on `darg` in the laGP package. Similarly, a list can be provided that includes: `start` – starting value to initialize the lengthscale `min` – minimum value in the allowable range for the lengthscale `max` – maximum value in the allowable range for the lengthscale `ab` – shape and rate parameters specifying a Gamma prior for the lengthscale If `ab` is not provided, a prior is not placed with the likelihood for optimization. If `min` and `max` aren't provided, the lengthscale is not optimized. If a single positive scalar is provided, the lengthscale is fixed for all predictions. Alternatively, a vector of lengthscales whose length equals `nrow(XX)` can be provided to fix distinct lengthscales for each prediction.
`nu`	a positive number used to set the scale parameter; default (`NULL`) calculates the maximum likelihood estimator
`epsK`	a small positive number added to the diagonal of the correlation `matrix` of inducing points for numerically stability for inversion. It is automatically increased if neccessary for each prediction.
`epsQ`	a small positive number added to the diagonal of the Q `matrix` (see Cole (2021)) of inducing points for numerically stability for inversion. It is automatically increased if neccessary for each prediction.
`tol`	a positive number to serve as the tolerance level for covergence of the log-likelihood when optimizing the hyperparameter(s) theta, g
`reps`	a notification of replicate design locations in the data set. If `TRUE`, the unique design locations are used for the calculations along with the average response for each unique design location. Alternatively, `reps` can be a list from `find_reps` in the hetGP package. In this case, `X` and `Y` are not used.
`Xni.return`	A scalar logical indicating whether or not a vector of indices into `X` (or `X0` if a reps list is supplied), specifying the chosen sub-design, should be returned on output.

The output is a list with the following components:

`mean`	a vector of predictive means of length `nrow(XX)`
`var`	a vector of predictive variances of length `nrow(XX)`
`nu`	a vector of values of the scale parameter of length `nrow(XX)`
`g`	a full version of the `g` argument
`theta`	a full version of the `theta` argument
`Xm.t`	the input for `Xm.t`
`eps`	a matrix of `epsK` and `epsQ` (jitter) values used for each prediction, `nrow(eps)=nrow(XX)`
`mle`	if `g` and/or `theta` is optimized, a `matrix` containing the values found for these parameters and the number of required iterations, for each predictive location in `XX`
`Xni`	when Xni.return = TRUE, this field contains a vector of indices of length `N` into `X` (or `X0`) indicating the sub-design (neighborhood) chosen. If `nrow(XX)>1`, a matrix is returned with each row matched with the corresponding row of `XX`
`time`	a scalar giving the passage of wall-clock time elapsed for (substantive parts of) the calculation

When selecting the neighborhood size (N) and number of inducing points in Xm.t, there is no general rule that works for all problems. However, for lower dimensions (dim<9) the following values seem to perform well: N = 100 + 10*dim, M = 10*dim

D. Austin Cole austin.cole8@vt.edu

D.A. Cole, R.B. Christianson, and R.B. Gramacy (2021). Locally Induced Gaussian Processes for Large-Scale Simulation Experiments Statistics and Computing, 31(3), 1-21; preprint on arXiv:2008.12857; https://arxiv.org/abs/2008.12857

darg, garg, find_reps, makeCluster, clusterApply