mleGP: Inference for GP correlation parameters In laGP: Local Approximate Gaussian Process Regression

Description

Maximum likelihood/a posteriori inference for (isotropic and separable) Gaussian lengthscale and nugget parameters, marginally or jointly, for Gaussian process regression

Usage

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 mleGP(gpi, param = c("d", "g"), tmin=sqrt(.Machine\$double.eps), tmax = -1, ab = c(0, 0), verb = 0) mleGPsep(gpsepi, param=c("d", "g", "both"), tmin=rep(sqrt(.Machine\$double.eps), 2), tmax=c(-1,1), ab=rep(0,4), maxit=100, verb=0) mleGPsep.R(gpsepi, param=c("d", "g"), tmin=sqrt(.Machine\$double.eps), tmax=-1, ab=c(0,0), maxit=100, verb=0) jmleGP(gpi, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGP.R(gpi, N=100, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), verb=0) jmleGPsep(gpsepi, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, verb=0) jmleGPsep.R(gpsepi, N=100, drange=c(sqrt(.Machine\$double.eps),10), grange=c(sqrt(.Machine\$double.eps), 1), dab=c(0,0), gab=c(0,0), maxit=100, mleGPsep=mleGPsep.R, verb=0)

Arguments

 gpi a C-side GP object identifier (positive integer); e.g., as returned by newGP gpsepi similar to gpi but indicating a separable GP object, as returned by newGPsep N for jmleGP.R, the maximum number of times the pair of margins should be iterated over before determining failed convergence; note that (at this time) jmleGP uses a hard-coded N=100 in its C implementation param for mleGP, indicating whether to work on the lengthscale (d) or nugget (g) margin tmin for mleGP, smallest value considered for the parameter (param) tmax for mleGP, largest value considered for the parameter (param) drange for jmleGP, these are c(tmin, tmax) values for the lengthscale parameter; the default values are reasonable for 1-d inputs in the unit interval grange for jmleGP, these are c(tmin, tmax) values for the nugget parameter; the default values are reasonable for responses with a range of one ab for mleGP, a non-negative 2-vector describing shape and rate parameters to a Gamma prior for the parameter (param); a zero-setting for either value results in no-prior being used (MLE inference); otherwise MAP inference is performed maxit for mleGPsep this is passed as control=list(trace=maxit) to optim's L-BFGS-B method for optimizing the likelihood/posterior of a separable GP representation; this argument is not used for isotropic GP versions, nor for optimizing the nugget dab for jmleGP, this is ab for the lengthscale parameter gab for jmleGP, this is ab for the nugget parameter mleGPsep function for internal MLE calculation of the separable lengthscale parameter; one of either mleGPsep.R based on method="L-BFGS-B" using optim; or mleGPsep using the C entry point lbfgsb. Both options use a C backend for the nugget verb a verbosity argument indicating how much information about the optimization steps should be printed to the screen; verb <= 0 is quiet; for jmleGP, a verb - 1 value is passed to the mleGP or mleGPsep subroutine(s)

Details

mleGP and mleGPsep perform marginal (or profile) inference for the specified param, either the lengthscale or the nugget. mleGPsep can perform simultaneous lengthscale and nugget inference via a common gradient with param = "both". More details are provided below.

For the lengthscale, mleGP uses a Newton-like scheme with analytic first and second derivatives (more below) to find the scalar parameter for the isotropic Gaussian correlation function, with hard-coded 100-max iterations threshold and a sqrt(.Machine\$double.eps) tolerance for determining convergence; mleGPsep.R uses L-BFGS-B via optim for the vectorized parameter of the separable Gaussian correlation, with a user-supplied maximum number of iterations (maxit) passed to optim. When maxit is reached the output conv = 1 is returned, subsequent identical calls to mleGPsep.R can be used to continue with further iterations. mleGPsep is similar, but uses the C entry point lbfgsb.

For the nugget, both mleGP and mleGPsep utilize a (nearly identical) Newton-like scheme leveraging first and second derivatives.

jmleGP and jmleGPsep provide joint inference by iterating over the marginals of lengthscale and nugget. The jmleGP.R function is an R-only wrapper around mleGP (which is primarily in C), whereas jmleGP is primarily in C but with reduced output and with hard-coded N=100. The same is true for jmleGPsep.

mleGPsep provides a param = "both" alternative to jmleGPsep leveraging a common gradient. It can be helpful to supply a larger maxit argument compared to jmleGPsep as the latter may do up to 100 outer iterations, cycling over lengthscale and nugget. mleGPsep usually requires many fewer total iterations, unless one of the lengthscale or nugget is already converged. In anticipation of param = "both" the mleGPsep function has longer default values for its bounds and prior arguments. These longer arguments are ignored when param != "both". At this time mleGP does not have a param = "both" option.

All methods are initialized at the value of the parameter(s) currently stored by the C-side object referenced by gpi or gpsepi. It is highly recommended that sensible range values (tmin, tmax or drange, grange) be provided. The defaults provided are too loose for most applications. As illustrated in the examples below, the darg and garg functions can be used to set appropriate ranges from the distributions of inputs and output data respectively.

The Newton-like method implemented for the isotropic lengthscale and for the nugget offers very fast convergence to local maxima, but sometimes it fails to converge (for any of the usual reasons). The implementation detects this, and in such cases it invokes a Brent_fmin call instead - this is the method behind the optimize function.

Note that the gpi or gpsepi object(s) must have been allocated with dK=TRUE; alternatively, one can call buildKGP or buildKGPsep - however, this is not in the NAMESPACE at this time

Value

A self-explanatory data.frame is returned containing the values inferred and the number of iterations used. The jmleGP.R and jmleGPsep.R functions will also show progress details (the values obtained after each iteration over the marginals).

However, the most important “output” is the modified GP object which retains the setting of the parameters reported on output as a side effect.

mleGPsep and jmleGPsep provide an output field/column called conv indicating convergence (when 0), or alternately a value agreeing with a non-convergence code provided on output by optim

Author(s)

Robert B. Gramacy rbg@vt.edu

References

For standard GP inference, refer to any graduate text, e.g., Rasmussen & Williams Gaussian Processes for Machine Learning;