# mleGP: Inference for GP correlation parameters In laGP: Local Approximate Gaussian Process Regression

## Description

Maximum likelihood/a posteriori inference for (isotropic and separable) Gaussian lengthscale and nugget parameters, marginally or jointly, for Gaussian process regression

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16``` ```mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine\$double.eps), tmax = -1, ab = c(0, 0), verb = 0) mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine\$double.eps), 2), tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0) mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine\$double.eps), tmax=-1, ab=c(0,0), maxit=100, verb=0) jmleGP(gpi, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGPsep(gpsepi, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, verb=0) jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, mleGPsep=mleGPsep.R, verb=0) ```

## Arguments

 `gpi` a C-side GP object identifier (positive integer); e.g., as returned by `newGP` `gpsepi` similar to `gpi` but indicating a separable GP object, as returned by `newGPsep` `N` for `jmleGP.R`, the maximum number of times the pair of margins should be iterated over before determining failed convergence; note that (at this time) `jmleGP` uses a hard-coded `N=100` in its C implementation `param` for `mleGP`, indicating whether to work on the lengthscale (`d`) or nugget (`g`) margin `tmin` for `mleGP`, smallest value considered for the parameter (`param`) `tmax` for `mleGP`, largest value considered for the parameter (`param`) `drange` for `jmleGP`, these are `c(tmin, tmax)` values for the lengthscale parameter; the default values are reasonable for 1-d inputs in the unit interval `grange` for `jmleGP`, these are `c(tmin, tmax)` values for the nugget parameter; the default values are reasonable for responses with a range of one `ab` for `mleGP`, a non-negative 2-vector describing shape and rate parameters to a Gamma prior for the parameter (`param`); a zero-setting for either value results in no-prior being used (MLE inference); otherwise MAP inference is performed `maxit` for `mleGPsep` this is passed as `control=list(trace=maxit)` to `optim`'s L-BFGS-B method for optimizing the likelihood/posterior of a separable GP representation; this argument is not used for isotropic GP versions, nor for optimizing the nugget `dab` for `jmleGP`, this is `ab` for the lengthscale parameter `gab` for `jmleGP`, this is `ab` for the nugget parameter `mleGPsep` function for internal MLE calculation of the separable lengthscale parameter; one of either `mleGPsep.R` based on `method="L-BFGS-B"` using `optim`; or `mleGPsep` using the `C` entry point `lbfgsb`. Both options use a `C` backend for the nugget `verb` a verbosity argument indicating how much information about the optimization steps should be printed to the screen; `verb <= 0` is quiet; for `jmleGP`, a `verb - 1` value is passed to the `mleGP` or `mleGPsep` subroutine(s)

## Details

`mleGP` and `mleGPsep` perform marginal (or profile) inference for the specified `param`, either the lengthscale or the nugget. `mleGPsep` can perform simultaneous lengthscale and nugget inference via a common gradient with `param = "both"`. More details are provided below.

For the lengthscale, `mleGP` uses a Newton-like scheme with analytic first and second derivatives (more below) to find the scalar parameter for the isotropic Gaussian correlation function, with hard-coded 100-max iterations threshold and a `sqrt(.Machine\$double.eps)` tolerance for determining convergence; `mleGPsep.R` uses L-BFGS-B via `optim` for the vectorized parameter of the separable Gaussian correlation, with a user-supplied maximum number of iterations (`maxit`) passed to `optim`. When `maxit` is reached the output `conv = 1` is returned, subsequent identical calls to `mleGPsep.R` can be used to continue with further iterations. `mleGPsep` is similar, but uses the `C` entry point `lbfgsb`.

For the nugget, both `mleGP` and `mleGPsep` utilize a (nearly identical) Newton-like scheme leveraging first and second derivatives.

`jmleGP` and `jmleGPsep` provide joint inference by iterating over the marginals of lengthscale and nugget. The `jmleGP.R` function is an R-only wrapper around `mleGP` (which is primarily in C), whereas `jmleGP` is primarily in C but with reduced output and with hard-coded `N=100`. The same is true for `jmleGPsep`.

`mleGPsep` provides a `param = "both"` alternative to `jmleGPsep` leveraging a common gradient. It can be helpful to supply a larger `maxit` argument compared to `jmleGPsep` as the latter may do up to 100 outer iterations, cycling over lengthscale and nugget. `mleGPsep` usually requires many fewer total iterations, unless one of the lengthscale or nugget is already converged. In anticipation of `param = "both"` the `mleGPsep` function has longer default values for its bounds and prior arguments. These longer arguments are ignored when `param != "both"`. At this time `mleGP` does not have a `param = "both"` option.

All methods are initialized at the value of the parameter(s) currently stored by the C-side object referenced by `gpi` or `gpsepi`. It is highly recommended that sensible range values (`tmin`, `tmax` or `drange`, `grange`) be provided. The defaults provided are too loose for most applications. As illustrated in the examples below, the `darg` and `garg` functions can be used to set appropriate ranges from the distributions of inputs and output data respectively.

The Newton-like method implemented for the isotropic lengthscale and for the nugget offers very fast convergence to local maxima, but sometimes it fails to converge (for any of the usual reasons). The implementation detects this, and in such cases it invokes a `Brent_fmin` call instead - this is the method behind the `optimize` function.

Note that the `gpi` or `gpsepi` object(s) must have been allocated with `dK=TRUE`; alternatively, one can call `buildKGP` or `buildKGPsep` - however, this is not in the NAMESPACE at this time

## Value

A self-explanatory `data.frame` is returned containing the values inferred and the number of iterations used. The `jmleGP.R` and `jmleGPsep.R` functions will also show progress details (the values obtained after each iteration over the marginals).

However, the most important “output” is the modified GP object which retains the setting of the parameters reported on output as a side effect.

`mleGPsep` and `jmleGPsep` provide an output field/column called `conv` indicating convergence (when 0), or alternately a value agreeing with a non-convergence code provided on output by `optim`

## Author(s)

Robert B. Gramacy [email protected]

## References

For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning;

`vignette("laGP")`, `newGP`, `laGP`, `llikGP`, `optimize`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71``` ```## a simple example with estimated nugget library(MASS) ## motorcycle data and predictive locations X <- matrix(mcycle[,1], ncol=1) Z <- mcycle[,2] ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Z) ## initialize the model gpi <- newGP(X, Z, d=d\$start, g=g\$start, dK=TRUE) ## separate marginal inference (not necessary - for illustration only) print(mleGP(gpi, "d", d\$min, d\$max)) print(mleGP(gpi, "g", g\$min, g\$max)) ## joint inference (could skip straight to here) print(jmleGP(gpi, drange=c(d\$min, d\$max), grange=c(g\$min, g\$max))) ## plot the resulting predictive surface N <- 100 XX <- matrix(seq(min(X), max(X), length=N), ncol=1) p <- predGP(gpi, XX, lite=TRUE) plot(X, Z, main="stationary GP fit to motorcycle data") lines(XX, p\$mean, lwd=2) lines(XX, p\$mean+1.96*sqrt(p\$s2*p\$df/(p\$df-2)), col=2, lty=2) lines(XX, p\$mean-1.96*sqrt(p\$s2*p\$df/(p\$df-2)), col=2, lty=2) ## clean up deleteGP(gpi) ## ## with a separable correlation function ## ## 2D Example: GoldPrice Function, mimicking GP_fit package f <- function(x) { x1 <- 4*x[,1] - 2 x2 <- 4*x[,2] - 2; t1 <- 1 + (x1 + x2 + 1)^2*(19 - 14*x1 + 3*x1^2 - 14*x2 + 6*x1*x2 + 3*x2^2); t2 <- 30 + (2*x1 -3*x2)^2*(18 - 32*x1 + 12*x1^2 + 48*x2 - 36*x1*x2 + 27*x2^2); y <- t1*t2; return(y) } ## build design library(tgp) n <- 50 ## change to 100 or 1000 for more interesting demo B <- rbind(c(0,1), c(0,1)) X <- dopt.gp(n, Xcand=lhs(10*n, B))\$XX ## this differs from GP_fit in that we use the log response Y <- log(f(X)) ## get sensible ranges d <- darg(NULL, X) g <- garg(list(mle=TRUE), Y) ## build GP and jointly optimize via profile mehtods gpisep <- newGPsep(X, Y, d=rep(d\$start, 2), g=g\$start, dK=TRUE) jmleGPsep(gpisep, drange=c(d\$min, d\$max), grange=c(g\$min, g\$max)) ## clean up deleteGPsep(gpisep) ## alternatively, we can optimize via a combined gradient gpisep <- newGPsep(X, Y, d=rep(d\$start, 2), g=g\$start, dK=TRUE) mleGPsep(gpisep, param="both", tmin=c(d\$min, g\$min), tmax=c(d\$max, d\$max)) deleteGPsep(gpisep) ```