GP_fit: Gaussian Process Model fitting In GPfit: Gaussian Processes Modeling

Description

For an (n x d) design matrix, X, and the corresponding (n x 1) simulator output Y, this function fits the GP model and returns the parameter estimates. The optimization routine assumes that the inputs are scaled to the unit hypercube [0,1]^d.

Usage

 1 2 3 GP_fit(X, Y, control = c(200 * d, 80 * d, 2 * d), nug_thres = 20, trace = FALSE, maxit = 100, corr = list(type = "exponential", power = 1.95), optim_start = NULL)

Arguments

 X the (n x d) design matrix Y the (n x 1) vector of simulator outputs. control a vector of parameters used in the search for optimal beta (search grid size, percent, number of clusters). See ‘Details’. nug_thres a parameter used in computing the nugget. See ‘Details’. trace logical, if TRUE, will provide information on the optim runs maxit the maximum number of iterations within optim, defaults to 100 corr a list of parameters for the specifing the correlation to be used. See corr_matrix. optim_start a matrix of potentially likely starting values for correlation hyperparameters for the optim runs, i.e., initial guess of the d-vector beta

Details

This function fits the following GP model, y(x) = μ + Z(x), x in [0,1]^d, where Z(x) is a GP with mean 0, Var(Z(x_i)) = σ^2, and Cov(Z(x_i),Z(x_j)) = σ^2*R_{ij}. Entries in covariance matrix R are determined by corr and parameterized by beta, a d-vector of parameters. For computational stability R^-1 is replaced with R_{δ_{lb}}^{-1}, where R_{δ{lb}} = R + δ_{lb}I and δ_{lb} is the nugget parameter described in Ranjan et al. (2011).

The parameter estimate beta is obtained by minimizing the deviance using a multi-start gradient based search (L-BFGS-B) algorithm. The starting points are selected using the k-means clustering algorithm on a large maximin LHD for values of beta, after discarding beta vectors with high deviance. The control parameter determines the quality of the starting points of the L-BFGS-B algoritm.

control is a vector of three tunable parameters used in the deviance optimization algorithm. The default values correspond to choosing 2*d clusters (using k-means clustering algorithm) based on 80*d best points (smallest deviance, refer to GP_deviance) from a 200*d - point random maximin LHD in beta. One can change these values to balance the trade-off between computational cost and robustness of likelihood optimization (or prediction accuracy). For details see MacDonald et al. (2015).

The nug_thres parameter is outlined in Ranjan et al. (2011) and is used in finding the lower bound on the nugget (δ_{lb}).

Value

an object of class GP containing parameter estimates beta and sig2, nugget parameter delta, the data (X and Y), and a specification of the correlation structure used.

Author(s)

Blake MacDonald, Hugh Chipman, Pritam Ranjan

References

MacDonald, K.B., Ranjan, P. and Chipman, H. (2015). GPfit: An R Package for Fitting a Gaussian Process Model to Deterministic Simulator Outputs. Journal of Statistical Software, 64(12), 1-23. http://www.jstatsoft.org/v64/i12/

Ranjan, P., Haynes, R., and Karsten, R. (2011). A Computationally Stable Approach to Gaussian Process Interpolation of Deterministic Computer Simulation Data, Technometrics, 53(4), 366 - 378.