# GP_fit: Gaussian Process Model fitting In GPfit: Gaussian Processes Modeling

## Description

For an (n x d) design matrix, `X`, and the corresponding (n x 1) simulator output `Y`, this function fits the GP model and returns the parameter estimates. The optimization routine assumes that the inputs are scaled to the unit hypercube [0,1]^d.

## Usage

 ```1 2 3``` ```GP_fit(X, Y, control = c(200 * d, 80 * d, 2 * d), nug_thres = 20, trace = FALSE, maxit = 100, corr = list(type = "exponential", power = 1.95), optim_start = NULL) ```

## Arguments

 `X` the (`n x d`) design matrix `Y` the (`n x 1`) vector of simulator outputs. `control` a vector of parameters used in the search for optimal beta (search grid size, percent, number of clusters). See ‘Details’. `nug_thres` a parameter used in computing the nugget. See ‘Details’. `trace` logical, if `TRUE`, will provide information on the `optim` runs `maxit` the maximum number of iterations within `optim`, defaults to 100 `corr` a list of parameters for the specifing the correlation to be used. See `corr_matrix`. `optim_start` a matrix of potentially likely starting values for correlation hyperparameters for the `optim` runs, i.e., initial guess of the d-vector `beta`

## Details

This function fits the following GP model, y(x) = μ + Z(x), x in [0,1]^d, where Z(x) is a GP with mean 0, Var(Z(x_i)) = σ^2, and Cov(Z(x_i),Z(x_j)) = σ^2*R_{ij}. Entries in covariance matrix R are determined by `corr` and parameterized by `beta`, a `d`-vector of parameters. For computational stability R^-1 is replaced with R_{δ_{lb}}^{-1}, where R_{δ{lb}} = R + δ_{lb}I and δ_{lb} is the nugget parameter described in Ranjan et al. (2011).

The parameter estimate `beta` is obtained by minimizing the deviance using a multi-start gradient based search (L-BFGS-B) algorithm. The starting points are selected using the k-means clustering algorithm on a large maximin LHD for values of `beta`, after discarding `beta` vectors with high deviance. The `control` parameter determines the quality of the starting points of the L-BFGS-B algoritm.

`control` is a vector of three tunable parameters used in the deviance optimization algorithm. The default values correspond to choosing 2*d clusters (using k-means clustering algorithm) based on 80*d best points (smallest deviance, refer to `GP_deviance`) from a 200*d - point random maximin LHD in `beta`. One can change these values to balance the trade-off between computational cost and robustness of likelihood optimization (or prediction accuracy). For details see MacDonald et al. (2015).

The `nug_thres` parameter is outlined in Ranjan et al. (2011) and is used in finding the lower bound on the nugget (δ_{lb}).

## Value

an object of class `GP` containing parameter estimates `beta` and `sig2`, nugget parameter `delta`, the data (`X` and `Y`), and a specification of the correlation structure used.

## Author(s)

Blake MacDonald, Hugh Chipman, Pritam Ranjan

## References

MacDonald, K.B., Ranjan, P. and Chipman, H. (2015). GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs. Journal of Statistical Software, 64(12), 1-23. http://www.jstatsoft.org/v64/i12/

Ranjan, P., Haynes, R., and Karsten, R. (2011). A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data, Technometrics, 53(4), 366 - 378.

`plot` for plotting in 1 and 2 dimensions;
`predict` for predicting the response and error surfaces;
`optim` for information on the L-BFGS-B procedure;
`GP_deviance` for computing the deviance.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50``` ```## 1D Example 1 n = 5 d = 1 computer_simulator <- function(x){ x = 2 * x + 0.5 y = sin(10 * pi * x) / (2 * x) + (x - 1)^4 return(y) } set.seed(3) library(lhs) x = maximinLHS(n, d) y = computer_simulator(x) GPmodel = GP_fit(x, y) print(GPmodel) ## 1D Example 2 n = 7 d = 1 computer_simulator <- function(x) { y <- log(x + 0.1) + sin(5 * pi * x) return(y) } set.seed(1) library(lhs) x = maximinLHS(n, d) y = computer_simulator(x) GPmodel = GP_fit(x, y) print(GPmodel, digits = 4) ## 2D Example: GoldPrice Function computer_simulator <- function(x) { x1 = 4 * x[, 1] - 2 x2 = 4 * x[, 2] - 2 t1 = 1 + (x1 + x2 + 1)^2 * (19 - 14 * x1 + 3 * x1^2 - 14 * x2 + 6 * x1 *x2 + 3 * x2^2) t2 = 30 + (2 * x1 - 3 * x2)^2 * (18 - 32 * x1 + 12 * x1^2 + 48 * x2 - 36 * x1 * x2 + 27 * x2^2) y = t1 * t2 return(y) } n = 30 d = 2 set.seed(1) library(lhs) x = maximinLHS(n, d) y = computer_simulator(x) GPmodel = GP_fit(x, y) print(GPmodel) ```