mlegp: maximum likelihood estimation of Gaussian process parameters

Share:

Description

Finds maximum likelihood estimates of Gaussian process parameters for a vector (or matrix) of one (or more) responses. For multiple responses, the user chooses between fitting independent Gaussian processes to the separate responses or fitting independent Gaussian processes to principle component weights obtained through singular value decomposition of the output. The latter is useful for functional output or data rich situations.

Usage

1
2
3
4
5
6
mlegp(X, Z, constantMean = 1, nugget = NULL, nugget.known = 0, 
        min.nugget = 0, param.names = NULL, gp.names = NULL, 
	PC.UD = NULL, PC.num = NULL, PC.percent = NULL, 
	simplex.ntries = 5, simplex.maxiter = 500, simplex.reltol = 1e-8,  
	BFGS.maxiter = 500, BFGS.tol = 0.01, BFGS.h = 1e-10, seed = 0, 
        verbose = 1, parallel = FALSE)

Arguments

X

the design matrix

Z

vector or matrix of observations; corresponding to the rows of X

constantMean

a value of 1 indicates that each Gaussian process will have a constant mean; otherwise the mean function will be a linear regression in X, plus an intercept term

nugget

if nugget.known is 1, a fixed value to use for the nugget or a vector corresponding to the fixed diagonal nugget matrix; otherwise, either a positive initial value for the nugget term which will be estimated, or a vector corresponding to the diagonal nugget matrix up to a multiplicative constant. If NULL (the default), mlegp estimates a nugget term only if there are replicates in the design matrix, see details

nugget.known

1 if a plug-in estimate of the nugget will be used; 0 otherwise

min.nugget

minimum value of the nugget term; 0 by default

param.names

a vector of parameter names, corresponding to the columns of X; parameter names are ‘p1’, ‘p2’, ... by default

gp.names

a vector of GP names, corresponding to the GPs fit to each column of Z or each PC weight

PC.UD

the UD matrix if Z is a matrix of principle component weights; see mlegp-svd-functions

PC.num

the number of principle component weights to keep in the singular value decomposition of Z

PC.percent

if not NULL the number of principle component weights kept is the minimum number that accounts for PC.percent of the total variance of the matrix Z

simplex.ntries

the number of simplexes to run

simplex.maxiter

maximum number of evaluations / simplex

simplex.reltol

relative tolerance for simplex method, defaulting to 1e-16

BFGS.maxiter

maximum number of iterations for BFGS method

BFGS.tol

stopping condition for BFGS method is when norm(gradient) < BFGS.tol * max(1, norm(x)), where x is the parameter vector and norm is the Euclidian norm

BFGS.h

derivatives are approximated as [f(x+BFGS.h) - f(x)] / BFGS.h)

seed

the random number seed

verbose

a value of '1' or '2' will result in status updates being printed; a value of '2' results in more information

parallel

if TRUE will fit GPs in parallel to each column of Z, or each set of PC weights; See details

Details

This function calls the C function fitGPFromR which in turn calls fitGP (both in the file fit_gp.h) to fit each Gaussian process.

Separate Gaussian processes are fit to the observations in each column of Z. Maximum likelihood estimates for correlation and nugget parameters are found through numerical methods (i.e., the Nelder-Mead Simplex and the L-BFGS method), while maximum likelihood estimates of the mean regression parameters and overall variance are calculated in closed form (given the correlation and (scaled) nugget parameters). Multiple simplexes are run, and estimates from the best simplex are used as initial values to the gradient (L-BFGS) method.

Gaussian processes are fit to principle component weights by utilizing the singular value decomposition (SVD) of Z, Z = UDVprime. Columns of Z should correspond to a single k-dimensional observation (e.g., functional output of a computer model, evaluated at a particular input)

In the complete SVD, Z is k x m, and r = min(k,m), U is k x r, D is r x r, containing the singular values along the diagonal, and Vprime is r x m. The output Z is approximated by keeping l < r singular values, keeping a UD matrix of dimension k x l, and the Vprime matrix of dimension l x m. Each column of Vprime now contains l principle component weights, which can be used to reconstruct the functional output.

If nugget.known = 1, nugget = NULL, and replicate observations are present, the nugget will be fixed at its best linear unbiased estimate (a weighted average of sample variances). For each column of Z, a GP will be fit to a collection of sample means rather than all observations. This is the recommended approach as it is more accurate and computationally more efficient.

Parallel support is provided through the package snowfall which allows multiple GPs to be fit in parallel. The user must set up the cluster using sfInit and call sfLibrary(mlegp) to load the library onto the slave nodes. Note: GP fitting is not recommended when the number of observations are large (> 100), in which case sequential GP fitting is faster.

Value

an object of class gp.list if Z has more than 1 column, otherwise an object of class gp

Note

The random number seed is 0 by default, but should be randomly set by the user

In some situations, especially for noiseless data, it may be desirable to force a nugget term in order to make the variance-covariance matrix of the Gaussian process more stable; this can be done by setting the argument min.nugget.

If fitting multiple Gaussian processes, the arguments min.nugget and nugget apply to all Gaussian processes being fit.

In some cases, the variance-covariance matrix is stable in C but not stable in R. When this happens, this function will attempt to impose a minimum value for the nugget term, and this will be reported. However, the user is encouraged to refit the GP and manually setting the argument min.nugget in mlegp.

When fitting Gaussian processes to principle component weights, a minimum of two principle component weights must be used.

Author(s)

Garrett M. Dancik dancikg@easternct.edu

References

Santner, T.J. Williams, B.J., Notz, W., 2003. The Design and Analysis of Computer Experiments (New York: Springer).

Heitmann, K., Higdon, D., Nakhleh, C., Habib, S., 2006. Cosmic Calibration. The Astrophysical Journal, 646, 2, L1-L4.

http://www1.easternct.edu/dancikg/

See Also

createGP for details of the gp object; gp.list for details of the gp.list object; mlegp-svd-functions for details on fitting Gaussian processes to high-dimensional data using principle component weights; the L-BFGS method uses C code written by Naoaki Okazaki (http://www.chokkan.org/software/liblbfgs)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
###### fit a single Gaussian process ######
x = -5:5; y1 = sin(x) + rnorm(length(x),sd=.1)
fit1 = mlegp(x, y1)

## summary and diagnostic plots ##
summary(fit1)
plot(fit1)

###### fit a single Gaussian process when replciates are present ######
x = kronecker(-5:5, rep(1,3))
y = x + rnorm(length(x))

## recommended approach: GP fit to sample means; nugget calcualted from sample variances ##
fit1 = mlegp(x,y, nugget.known = 1)

## original approach: GP fit to all observations; look for MLE of nugget ##
fit2 = mlegp(x,y)


###### fit multiple Gaussian processes to multiple observations ######
x = -5:5 
y1 = sin(x) + rnorm(length(x),sd=.1)
y2 = sin(x) + 2*x + rnorm(length(x), sd = .1)
fitMulti = mlegp(x, cbind(y1,y2))

## summary and diagnostic plots ##
summary(fitMulti)
plot(fitMulti)


###### fit multiple Gaussian processes using principle component weights ######

## generate functional output ##
x = seq(-5,5,by=.2)
p = 1:50
y = matrix(0,length(p), length(x))
for (i in p) {
	y[i,] = sin(x) + i + rnorm(length(x), sd  = .01)
}

## we now have 10 functional observations (each of length 100) ##
for (i in p) {
	plot(x,y[i,], type = "l", col = i, ylim = c(min(y), max(y)))
	par(new=TRUE)
}

## fit GPs to the two most important principle component weights ##
numPCs = 2
fitPC = mlegp(p, t(y), PC.num = numPCs)
plot(fitPC) ## diagnostics

## reconstruct the output Y = UDV'
Vprime = matrix(0,numPCs,length(p))
Vprime[1,] = predict(fitPC[[1]])
Vprime[2,] = predict(fitPC[[2]])

predY = fitPC$UD%*%Vprime
m1 = min(y[39,], predY[,39])
m2 = max(y[39,], predY[,39])

plot(x, y[39,], type="l", lty = 1, ylim = c(m1,m2), ylab = "original y" )
par(new=TRUE)
plot(x, predY[,39], type = "p", col = "red", ylim = c(m1,m2), ylab = "predicted y" )

## Not run: 
### fit GPs in parallel ###
library(snowfall)
sfInit(parallel = TRUE, cpus = 2, slaveOutfile = "slave.out")
sfLibrary(mlegp)
fitPC = mlegp(p, t(y), PC.num = 2, parallel = TRUE)

## End(Not run)