mleGP: Inference for GP correlation parameters
In laGP: Local Approximate Gaussian Process Regression

mleGP

R Documentation

Inference for GP correlation parameters

Description

Maximum likelihood/a posteriori inference for (isotropic and separable) Gaussian lengthscale and nugget parameters, marginally or jointly, for Gaussian process regression

Usage

mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine$double.eps), 
  tmax = -1, ab = c(0, 0), verb = 0)
mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine$double.eps), 2),
  tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0)
mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine$double.eps), 
  tmax=-1, ab=c(0,0), maxit=100, verb=0)
jmleGP(gpi, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0)
jmleGPsep(gpsepi, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), 
  maxit=100, verb=0)
jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine$double.eps),10), 
  grange=c(sqrt(.Machine$double.eps), 1), dab=c(0,0), gab=c(0,0), 
  maxit=100, mleGPsep=mleGPsep.R, verb=0)

Arguments

`gpi`	a C-side GP object identifier (positive integer); e.g., as returned by `newGP`
`gpsepi`	similar to `gpi` but indicating a separable GP object, as returned by `newGPsep`
`N`	for `jmleGP.R`, the maximum number of times the pair of margins should be iterated over before determining failed convergence; note that (at this time) `jmleGP` uses a hard-coded `N=100` in its C implementation
`param`	for `mleGP`, indicating whether to work on the lengthscale (`d`) or nugget (`g`) margin
`tmin`	for `mleGP`, smallest value considered for the parameter (`param`)
`tmax`	for `mleGP`, largest value considered for the parameter (`param`); a setting of `-1` for lengthscales, the default, causes `ncol(X)^2` to be used
`drange`	for `jmleGP`, these are `c(tmin, tmax)` values for the lengthscale parameter; the default values are reasonable for 1-d inputs in the unit interval
`grange`	for `jmleGP`, these are `c(tmin, tmax)` values for the nugget parameter; the default values are reasonable for responses with a range of one
`ab`	for `mleGP`, a non-negative 2-vector describing shape and rate parameters to a Gamma prior for the parameter (`param`); a zero-setting for either value results in no-prior being used (MLE inference); otherwise MAP inference is performed
`maxit`	for `mleGPsep` this is passed as `control=list(trace=maxit)` to `optim`'s L-BFGS-B method for optimizing the likelihood/posterior of a separable GP representation; this argument is not used for isotropic GP versions, nor for optimizing the nugget
`dab`	for `jmleGP`, this is `ab` for the lengthscale parameter
`gab`	for `jmleGP`, this is `ab` for the nugget parameter
`mleGPsep`	function for internal MLE calculation of the separable lengthscale parameter; one of either `mleGPsep.R` based on `method="L-BFGS-B"` using `optim`; or `mleGPsep` using the `C` entry point `lbfgsb`. Both options use a `C` backend for the nugget
`verb`	a verbosity argument indicating how much information about the optimization steps should be printed to the screen; `verb <= 0` is quiet; for `jmleGP`, a `verb - 1` value is passed to the `mleGP` or `mleGPsep` subroutine(s)

Details

mleGP and mleGPsep perform marginal (or profile) inference for the specified param, either the lengthscale or the nugget. mleGPsep can perform simultaneous lengthscale and nugget inference via a common gradient with param = "both". More details are provided below.

For the lengthscale, mleGP uses a Newton-like scheme with analytic first and second derivatives (more below) to find the scalar parameter for the isotropic Gaussian correlation function, with hard-coded 100-max iterations threshold and a sqrt(.Machine$double.eps) tolerance for determining convergence; mleGPsep.R uses L-BFGS-B via optim for the vectorized parameter of the separable Gaussian correlation, with a user-supplied maximum number of iterations (maxit) passed to optim. When maxit is reached the output conv = 1 is returned, subsequent identical calls to mleGPsep.R can be used to continue with further iterations. mleGPsep is similar, but uses the C entry point lbfgsb.

For the nugget, both mleGP and mleGPsep utilize a (nearly identical) Newton-like scheme leveraging first and second derivatives.

jmleGP and jmleGPsep provide joint inference by iterating over the marginals of lengthscale and nugget. The jmleGP.R function is an R-only wrapper around mleGP (which is primarily in C), whereas jmleGP is primarily in C but with reduced output and with hard-coded N=100. The same is true for jmleGPsep.

mleGPsep provides a param = "both" alternative to jmleGPsep leveraging a common gradient. It can be helpful to supply a larger maxit argument compared to jmleGPsep as the latter may do up to 100 outer iterations, cycling over lengthscale and nugget. mleGPsep usually requires many fewer total iterations, unless one of the lengthscale or nugget is already converged. In anticipation of param = "both" the mleGPsep function has longer default values for its bounds and prior arguments. These longer arguments are ignored when param != "both". At this time mleGP does not have a param = "both" option.

All methods are initialized at the value of the parameter(s) currently stored by the C-side object referenced by gpi or gpsepi. It is highly recommended that sensible range values (tmin, tmax or drange, grange) be provided. The defaults provided are too loose for most applications. As illustrated in the examples below, the darg and garg functions can be used to set appropriate ranges from the distributions of inputs and output data respectively.

The Newton-like method implemented for the isotropic lengthscale and for the nugget offers very fast convergence to local maxima, but sometimes it fails to converge (for any of the usual reasons). The implementation detects this, and in such cases it invokes a Brent_fmin call instead - this is the method behind the optimize function.

Note that the gpi or gpsepi object(s) must have been allocated with dK=TRUE; alternatively, one can call buildKGP or buildKGPsep - however, this is not in the NAMESPACE at this time

Value

A self-explanatory data.frame is returned containing the values inferred and the number of iterations used. The jmleGP.R and jmleGPsep.R functions will also show progress details (the values obtained after each iteration over the marginals).

However, the most important “output” is the modified GP object which retains the setting of the parameters reported on output as a side effect.

mleGPsep and jmleGPsep provide an output field/column called conv indicating convergence (when 0), or alternately a value agreeing with a non-convergence code provided on output by optim

Author(s)

Robert B. Gramacy rbg@vt.edu

References

For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning, or

Gramacy, R. B. (2020) Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences. Boca Raton, Florida: Chapman Hall/CRC. (See Chapter 5.) https://bobby.gramacy.com/surrogates/

Examples

## a simple example with estimated nugget
if(require("MASS")) {

  ## motorcycle data and predictive locations
  X <- matrix(mcycle[,1], ncol=1)
  Z <- mcycle[,2]

  ## get sensible ranges
  d <- darg(NULL, X)
  g <- garg(list(mle=TRUE), Z)

  ## initialize the model
  gpi <- newGP(X, Z, d=d$start, g=g$start, dK=TRUE)

  ## separate marginal inference (not necessary - for illustration only)
  print(mleGP(gpi, "d", d$min, d$max))
  print(mleGP(gpi, "g", g$min, g$max))

  ## joint inference (could skip straight to here)
  print(jmleGP(gpi, drange=c(d$min, d$max), grange=c(g$min, g$max)))

  ## plot the resulting predictive surface
  N <- 100
  XX <- matrix(seq(min(X), max(X), length=N), ncol=1)
  p <- predGP(gpi, XX, lite=TRUE)
  plot(X, Z, main="stationary GP fit to motorcycle data")
  lines(XX, p$mean, lwd=2)
  lines(XX, p$mean+1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)
  lines(XX, p$mean-1.96*sqrt(p$s2*p$df/(p$df-2)), col=2, lty=2)

  ## clean up
  deleteGP(gpi)
}

## 
## with a separable correlation function 
##

## 2D Example: GoldPrice Function, mimicking GP_fit package
f <- function(x) 
{
  x1 <- 4*x[,1] - 2
  x2 <- 4*x[,2] - 2;
  t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2);
  t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2);
  y <- t1*t2;
  return(y)
}

## build design
library(tgp)
n <- 50 ## change to 100 or 1000 for more interesting demo
B <- rbind(c(0,1), c(0,1))
X <- dopt.gp(n, Xcand=lhs(10*n, B))$XX
## this differs from GP_fit in that we use the log response
Y <- log(f(X))

## get sensible ranges
d <- darg(NULL, X)
g <- garg(list(mle=TRUE), Y)

## build GP and jointly optimize via profile mehtods
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
jmleGPsep(gpisep, drange=c(d$min, d$max), grange=c(g$min, g$max))

## clean up
deleteGPsep(gpisep)

## alternatively, we can optimize via a combined gradient
gpisep <- newGPsep(X, Y, d=rep(d$start, 2), g=g$start, dK=TRUE)
mleGPsep(gpisep, param="both", tmin=c(d$min, g$min), tmax=c(d$max, g$max))
deleteGPsep(gpisep)

laGP documentation built on March 31, 2023, 9:46 p.m.