emulator: Function to fit an emulator to ensemble model output

Description Usage Arguments Details Value Note Limitations and Caveats Note References See Also Examples

Description

Fits a Gaussian Process emulator to regularly spaced time-series (or 1D spatial) model output. This emulator can later be used to interpolate the model output across model parameter settings. The emulator is fit by maximizing the emulator likelihood using a PORT local optimization routine. The emulator has a flexible structure which can be chosen by the user (see Details).

Usage

1
2
emulator(mpars, moutput, par.reg, time.reg, kappa0, zeta0, myrel.tol =
NULL, twice = FALSE, fix.betas = TRUE)

Arguments

mpars

Model ensemble parameter settings. A list with components:

$par

Matrix of ensemble parameter settings. [row, column] = [parameter index, run index]. Columns should correspond to columns of the model output matrix moutput$out.

$parnames

Parameter names for rows of $par (character vector)

$parunits

Units for parameters in rows of $par (character vector)

moutput

Model output. A list with components:

$t

Time vector for rows of $out (vector)

$tunits

Time units (character)

$out

Model output matrix [row, col] = [time, run index]. The run indices should match with the mpars$par argument.

$outname

name (character)

$outunits

units (character

par.reg

Logical vector specifying which parameters to include into the regression matrix. Elements refer to rows of mpars$par. Note that if nothing is included into the regression matrix, the mean is still optimized.

time.reg

Logical specifying whether to include time into the regression matrix.

kappa0

Initial guess for the parameter covariance scaling factor. Larger values lead to higher parameter covariance.

zeta0

Initial guess for the nugget.

myrel.tol

Relative tolerance for optimization. If you want to use the defaults, assign it to NULL. The optimization stops if it thinks the true likelihood is within this fraction of current likelihood. The higher the number the sooner the optimization stops, but the emulator is 'worse'.

twice

Logical. If TRUE, the optimization is done twice with different initial parameter values, and then the best of the two is chosen as final emulator. Default is FALSE.

fix.betas

Logical. If TRUE, beta regression coefficients are fixed, and if FALSE, they are estimated. Default is TRUE.

Details

The emulator parameters are optimized using a local optimization method, to within the specified relative tolerance. The optimization maximizes emulator likelihood. Optionally, if twice=TRUE, the optimization is done twice with different initial parameter settings, and the emulator with the highest likelihood is selected. Depending on the value of fix.betas, the regression beta parameters can either be fixed at their multiple linear regression estimates, or estimated together with other emulator parameters. Mean structure is flexible, and user can choose which parameter and time dimensions to use as regressors via the time.reg and par.reg parameters

Value

Emulator returns an object of class "emul" and is a list with components (see also Reference in the References Section).

$Theta.mat

Theta matrix. [row, col] = [run number, parameter number]

$t.vec

Time vector

$Y.mat

Data matrix Y

$X.mat

Regression matrix

$beta.vec

A vector of regression parameters

$kappa

Parameter covariance scaling factor

$phi.vec

A vector of range parameters phi

$zeta

Nugget

$n

Length of time dimension

$rho

Time AR(1) parameter

$p

Number of model runs in the ensemble

$vecC

Data matrix Y minus the mean vector

$par.reg

Logical vector specifying which parameters to include in the regression matrix

$time.reg

Logical, specifies whether to include time into the regression matrix

$Sigma.mats

List of two precomputed matrices that may be useful: $Sigma.t.mat and $Sigma.theta.mat (see References)

$Sigma.theta.inv.mat

Inverse of $Sigma.mats$Sigma.theta.mat

Note

Evaluation of this function (especially for large datasets and many parameters) might take a long time.

Limitations and Caveats

  1. The emulator will not work well for 'jagged' model response surfaces (high nugget)

  2. The emulator is restricted to output at regular intervals in time and space

  3. The code has not been tested under conditions of extreme high / extreme low input parameter range, output time(space) coordinates range, and output range (an example would be model output ranging from -1E20 to 1E20, etc.). In such cases it is recommended to re-scale the time(space) coordinates vector, the input parameters, and/or the model output.

  4. The emulator will not work on scalar model output – it requires multivariate data

  5. The emulator assumes a separable covariance function, and stationarity of the covariance part of the Gaussian process.

  6. The optimization of the emulator parameters degrades dramatically (and increases in time) as a function of number of free parameters. Hence, the emulator might be of limited use for large parameter ensembles

  7. The emulator authors are not responsible for any code errors and/or bugs

Note

If the final emulator performs poorly, try adjusting the settings, especially the initial kappa0 and zeta0 values.

References

R. Olson and W. Chang (2013): Mathematical framework for a separable Gaussian Process Emulator. Tech. Rep., available from
www.scrimhub.org/resources/stilt/Olson_and_Chang_2013_Stilt_Emulator_Technical_Report.pdf.

See Also

emul.lik, optimize.emul

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Fit an emulator to the example 1D parameter ensemble data
# Do not use any covariates, use standard settings
data(Data.1D.model)
data(Data.1D.par)
emul.1D <- emulator(Data.1D.par, Data.1D.model, FALSE, FALSE, 1, 1)

# Take a look at the range and regression parameters
cat("Range parameters are:", emul.1D$phi.vec, "\n")
cat("Regression parameters are:", emul.1D$beta.vec, "\n")

# Predict using the emulator at Theta*=3
pred.1D <- predict(emul.1D, 3)

# Plot the prediction
plot(emul.1D$t.vec, pred.1D$mean, xlab="Year", ylab="Sample Output")


## Not run: 
    # Fit an emulator to the UVic ESCM 3-parameter ensemble temperature
    # output Use time and aerosol scaling covariates (parameter 3), run
    # the optimization twice with relative tolerance of 0.1, keep
    # regression parameters at their multiple linear regression
    # estimates data(Data.UVic.model) data(Data.UVic.par) UVic.emul <-
    # emulator(mpars=Data.UVic.par, moutput=Data.UVic.model,
    # par.reg=c(FALSE, FALSE, TRUE), time.reg=TRUE, kappa0=1, zeta0=1,
    # myrel.tol=0.1, twice=TRUE, fix.betas=TRUE)
   
## End(Not run)

stilt documentation built on May 2, 2019, 1:10 p.m.