# mlegp: maximum likelihood estimation of Gaussian process parameters

### Description

Finds maximum likelihood estimates of Gaussian process parameters for a vector (or matrix) of one (or more) responses. For multiple responses, the user chooses between fitting independent Gaussian processes to the separate responses or fitting independent Gaussian processes to principle component weights obtained through singular value decomposition of the output. The latter is useful for functional output or data rich situations.

### Usage

1 2 3 4 5 6 | ```
mlegp(X, Z, constantMean = 1, nugget = NULL, nugget.known = 0,
min.nugget = 0, param.names = NULL, gp.names = NULL,
PC.UD = NULL, PC.num = NULL, PC.percent = NULL,
simplex.ntries = 5, simplex.maxiter = 500, simplex.reltol = 1e-8,
BFGS.maxiter = 500, BFGS.tol = 0.01, BFGS.h = 1e-10, seed = 0,
verbose = 1, parallel = FALSE)
``` |

### Arguments

`X` |
the design matrix |

`Z` |
vector or matrix of observations; corresponding to the rows of |

`constantMean` |
a value of 1 indicates that each Gaussian process will have a constant mean; otherwise the mean function will be a linear regression in |

`nugget` |
if nugget.known is 1, a fixed value to use for the nugget or a vector corresponding to the fixed diagonal nugget matrix; otherwise, either a positive initial value for the nugget term which will be estimated, or a vector corresponding to the diagonal nugget matrix up to a multiplicative constant. If |

`nugget.known` |
1 if a plug-in estimate of the nugget will be used; 0 otherwise |

`min.nugget` |
minimum value of the nugget term; 0 by default |

`param.names` |
a vector of parameter names, corresponding to the columns of X; parameter names are ‘p1’, ‘p2’, ... by default |

`gp.names` |
a vector of GP names, corresponding to the GPs fit to each column of |

`PC.UD` |
the UD matrix if |

`PC.num` |
the number of principle component weights to keep in the singular value decomposition of |

`PC.percent` |
if not |

`simplex.ntries` |
the number of simplexes to run |

`simplex.maxiter` |
maximum number of evaluations / simplex |

`simplex.reltol` |
relative tolerance for simplex method, defaulting to 1e-16 |

`BFGS.maxiter` |
maximum number of iterations for BFGS method |

`BFGS.tol` |
stopping condition for BFGS method is when norm(gradient) < BFGS.tol * max(1, norm(x)), where x is the parameter vector and norm is the Euclidian norm |

`BFGS.h` |
derivatives are approximated as [f(x+BFGS.h) - f(x)] / BFGS.h) |

`seed` |
the random number seed |

`verbose` |
a value of '1' or '2' will result in status updates being printed; a value of '2' results in more information |

`parallel` |
if TRUE will fit GPs in parallel to each column of Z, or each set of PC weights; See details |

### Details

This function calls the C function fitGPFromR which in turn calls fitGP (both in the file fit_gp.h) to fit each Gaussian process.

Separate Gaussian processes are fit to the observations in each column of `Z`

. Maximum likelihood estimates for correlation and nugget parameters are found through numerical methods (i.e., the Nelder-Mead Simplex and the L-BFGS method), while maximum likelihood estimates of the mean regression parameters and overall variance are calculated in closed form (given the correlation and (scaled) nugget parameters). Multiple simplexes are run, and estimates from the best simplex are used as initial values to the gradient (L-BFGS) method.

Gaussian processes are fit to principle component weights by utilizing the singular value decomposition (SVD) of `Z`

, Z = UDVprime. Columns of `Z`

should correspond to a single k-dimensional observation (e.g., functional output of a computer model, evaluated at a particular input)

In the complete SVD, `Z`

is k x m, and r = min(k,m), `U`

is k x r, `D`

is r x r, containing the singular values along the diagonal, and `Vprime`

is r x m. The output `Z`

is approximated by keeping l < r singular values, keeping a UD matrix of dimension k x l, and the `Vprime`

matrix of dimension l x m. Each column of `Vprime`

now contains l principle component weights, which can be used to reconstruct the functional output.

If nugget.known = 1, nugget = NULL, and replicate observations are present, the nugget will be fixed at its best linear unbiased estimate (a weighted average of sample variances). For each column of Z, a GP will be fit to a collection of sample means rather than all observations. This is the recommended approach as it is more accurate and computationally more efficient.

Parallel support is provided through the package `snowfall`

which allows multiple GPs to be fit in parallel. The user must set up the cluster using `sfInit`

and call `sfLibrary(mlegp)`

to load the library onto the slave nodes. Note: GP fitting is not recommended when the number of observations are large (> 100), in which case sequential GP fitting is faster.

### Value

an object of class `gp.list`

if Z has more than 1 column, otherwise an object of class `gp`

### Note

The random number seed is 0 by default, but should be randomly set by the user

In some situations, especially for noiseless data, it may be desirable to force a nugget term in order to make the variance-covariance matrix of the Gaussian process more stable; this can be done by setting the argument `min.nugget`

.

If fitting multiple Gaussian processes, the arguments `min.nugget`

and `nugget`

apply to all Gaussian processes being fit.

In some cases, the variance-covariance matrix is stable in C but not stable in R. When this happens, this function will attempt to impose a minimum value for the nugget term, and this will be reported. However, the user is encouraged to refit the GP and manually setting the argument `min.nugget`

in `mlegp`

.

When fitting Gaussian processes to principle component weights, a minimum of two principle component weights must be used.

### Author(s)

Garrett M. Dancik dancikg@easternct.edu

### References

Santner, T.J. Williams, B.J., Notz, W., 2003. The Design and Analysis of Computer Experiments (New York: Springer).

Heitmann, K., Higdon, D., Nakhleh, C., Habib, S., 2006. Cosmic Calibration. The Astrophysical Journal, 646, 2, L1-L4.

http://www1.easternct.edu/dancikg/

### See Also

`createGP`

for details of the `gp`

object; `gp.list`

for details of the `gp.list`

object; `mlegp-svd-functions`

for details on fitting Gaussian processes to high-dimensional data using principle component weights; the L-BFGS method uses C code written by Naoaki Okazaki (http://www.chokkan.org/software/liblbfgs)

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ```
###### fit a single Gaussian process ######
x = -5:5; y1 = sin(x) + rnorm(length(x),sd=.1)
fit1 = mlegp(x, y1)
## summary and diagnostic plots ##
summary(fit1)
plot(fit1)
###### fit a single Gaussian process when replciates are present ######
x = kronecker(-5:5, rep(1,3))
y = x + rnorm(length(x))
## recommended approach: GP fit to sample means; nugget calcualted from sample variances ##
fit1 = mlegp(x,y, nugget.known = 1)
## original approach: GP fit to all observations; look for MLE of nugget ##
fit2 = mlegp(x,y)
###### fit multiple Gaussian processes to multiple observations ######
x = -5:5
y1 = sin(x) + rnorm(length(x),sd=.1)
y2 = sin(x) + 2*x + rnorm(length(x), sd = .1)
fitMulti = mlegp(x, cbind(y1,y2))
## summary and diagnostic plots ##
summary(fitMulti)
plot(fitMulti)
###### fit multiple Gaussian processes using principle component weights ######
## generate functional output ##
x = seq(-5,5,by=.2)
p = 1:50
y = matrix(0,length(p), length(x))
for (i in p) {
y[i,] = sin(x) + i + rnorm(length(x), sd = .01)
}
## we now have 10 functional observations (each of length 100) ##
for (i in p) {
plot(x,y[i,], type = "l", col = i, ylim = c(min(y), max(y)))
par(new=TRUE)
}
## fit GPs to the two most important principle component weights ##
numPCs = 2
fitPC = mlegp(p, t(y), PC.num = numPCs)
plot(fitPC) ## diagnostics
## reconstruct the output Y = UDV'
Vprime = matrix(0,numPCs,length(p))
Vprime[1,] = predict(fitPC[[1]])
Vprime[2,] = predict(fitPC[[2]])
predY = fitPC$UD%*%Vprime
m1 = min(y[39,], predY[,39])
m2 = max(y[39,], predY[,39])
plot(x, y[39,], type="l", lty = 1, ylim = c(m1,m2), ylab = "original y" )
par(new=TRUE)
plot(x, predY[,39], type = "p", col = "red", ylim = c(m1,m2), ylab = "predicted y" )
## Not run:
### fit GPs in parallel ###
library(snowfall)
sfInit(parallel = TRUE, cpus = 2, slaveOutfile = "slave.out")
sfLibrary(mlegp)
fitPC = mlegp(p, t(y), PC.num = 2, parallel = TRUE)
## End(Not run)
``` |