Generate Emulators from Data


Given data from simulator runs, generates a set of Emulator objects, one for each output.


Required. A data.frame containing parameter and output values


Required. A character vector of output names


Required if input_names is not given. A named list of input parameter ranges


Required if ranges is not given. The names of the parameters


Selects between deterministic, variance, covariance, and multistate emulation


A collection of user-determined priors (see description)


To what polynomial order should regression surfaces be fitted?


Should uncertainty in the regression coefficients be included?


If not exp_sq, the name of the correlation structures to fit


Should the return emulators be Bayes linear adjusted?


Any known internal or external discrepancies of the model


Should status updates be provided?


If TRUE, removes output values that are NA


If TRUE, modifies ranges to a conservative minimum enclosing hyperrectangle


If provided, outputs are checked for consistent over/underestimation


Internal - distinguishes deterministic from hierarchical emulators


User-specified options for emulating covariance matrices


Any additional parameters for custom correlators or additional verbosity options


Many of the parameters that can be passed to this function are optional: the minimal operating example requires input_data, output_names, and one of ranges or input_names. If ranges is supplied, the input names are intuited from that list, data.frame, or data.matrix; if only input_names is supplied, then ranges are assumed to be [-1, 1] for each input.

The ranges can be provided in a few different ways: either as a named list of length-2 numeric vectors (corresponding to upper and lower bounds for each parameter); as a data.frame with 2 columns and each row corresponding to a parameter; or as a data.matrix defined similarly as the data.frame. In the cases where the ranges are provided as a data.frame or data.matrix, the row.names of the data object must be provided, and a warning will be given if not.

If the set (input_data, output_names, ranges) is provided and nothing else, then emulators are fitted as follows. The basis functions and associated regression coefficients are generated using linear regression up to quadratic order, allowing for cross-terms. These regression parameters are assumed 'known'.

The correlation function c(x, x') is assumed to be exp_sq and a corresponding Correlator object is created. The hyperparameters of the correlation structure are determined using a constrained maximum likelihood argument. This determines the variance, correlation length, and nugget term.

The maximum allowed order of the regression coefficients is controlled by order; the regression coefficients themselves can be deemed uncertain by setting beta.var = TRUE (in which case their values can change in the hyperparameter estimation); the hyperparameter search can be overridden by specifying ranges for each using hp_range.

In the presence of expert beliefs about the structure of the emulators, information can be supplied directly using the specified_priors argument. This can contain specific regression coefficient values beta and regression functions func, correlation structures u, hyperparameter values hyper_p and nugget term values delta.

Some rudimentary data handling functionality exists, but is not a substitute for sense-checking input data directly. The na.rm option will remove rows of training data that contain NA values if true; the check.ranges option allows a redefinition of the ranges of input parameters for emulator training if true. The latter is a common practice in later waves of emulation in order to maximise the predictive power of the emulators, but should only be used if it is believed that the training set provided is truly representative of and spans the full space of interest.

Various different classes of emulator can be created using this function, depending on the nature of the model. The emulator_type argument accepts a few different options:


Create emulators for the mean and variance surfaces, for each stochastic output


Create emulators for the mean surface, and a covariance matrix for the variance surface


Create sets of emulators per output for multistate stochastic systems


Deterministic emulators with no covariance structure

The "default" behaviour will apply if the emulator_type argument is not supplied, or does not match any of the above options. If the data provided looks to display stochasticity, but default behaviour is used, a warning will be generated and only the first model result for each individual parameter set will be used in training.

For examples of this function's usage (including optional argument behaviour), see the examples.


An appropriately structured list of Emulator objects


# Deterministic: use the SIRSample training dataset as an example.
ranges <- list(aSI = c(0.1, 0.8), aIR = c(0, 0.5), aSR = c(0, 0.05))
out_vars <- c('nS', 'nI', 'nR')
ems_linear <- emulator_from_data(SIRSample$training, out_vars, ranges, order = 1)
ems_linear # Printout of the key information.

# Stochastic: use the BirthDeath training dataset
v_ems <- emulator_from_data(BirthDeath$training, c("Y"),
 list(lambda = c(0, 0.08), mu = c(0.04, 0.13)), emulator_type = 'variance')

# If different specifications are wanted for variance/expectation ems, then
# enter a list with entries 'variance', 'expectation'. Eg corr_names
v_ems_corr <- emulator_from_data(BirthDeath$training, c("Y"),
 list(lambda = c(0, 0.08), mu = c(0.4, 0.13)), emulator_type = 'variance',
 corr_name = list(variance = "matern", expectation = "exp_sq")

 # Excessive runtime
  ems_quad <- emulator_from_data(SIRSample$training, out_vars, ranges)
  ems_quad # Now includes quadratic terms
  ems_cub <- emulator_from_data(SIRSample$training, out_vars, ranges, order = 3)
  ems_cub # Up to cubic order in the parameters

  ems_unadjusted <- emulator_from_data(SIRSample$training, out_vars, ranges, adjusted = FALSE)
  ems_unadjusted # Looks the same as ems_quad, but the emulators are not Bayes Linear adjusted

  # Reproduce the linear case, but with slightly adjusted beta values
  basis_f <- list(
   c(function(x) 1, function(x) x[[1]], function(x) x[[2]]),
   c(function(x) 1, function(x) x[[1]], function(x) x[[2]]),
   c(function(x) 1, function(x) x[[1]], function(x) x[[3]])
  beta_val <- list(
   list(mu = c(550, -400, 250)),
   list(mu = c(200, 200, -300)),
   list(mu = c(200, 200, -50))
  ems_custom_beta <- emulator_from_data(SIRSample$training, out_vars, ranges,
   specified_priors = list(func = basis_f, beta = beta_val)
  # Custom correlation functions
  corr_structs <- list(
   list(sigma = 83, corr = Correlator$new('exp_sq', list(theta = 0.5), nug = 0.1)),
   list(sigma = 95, corr = Correlator$new('exp_sq', list(theta = 0.4), nug = 0.25)),
   list(sigma = 164, corr = Correlator$new('matern', list(theta = 0.2, nu = 1.5), nug = 0.45))
  ems_custom_u <- emulator_from_data(SIRSample$training, out_vars, ranges,
  specified_priors = list(u = corr_structs))
  # Allowing the function to choose hyperparameters for 'non-standard' correlation functions
  ems_matern <- emulator_from_data(SIRSample$training, out_vars, ranges, corr_name = 'matern')
  # Providing hyperparameters directly
  matern_hp <- list(
   list(theta = 0.8, nu = 1.5),
   list(theta = 0.6, nu = 2.5),
   list(theta = 1.2, nu = 0.5)
  ems_matern2 <- emulator_from_data(SIRSample$training, out_vars, ranges, corr_name = 'matern',
   specified_priors = list(hyper_p = matern_hp))
  # "Custom" correaltion function with user-specified ranges: gamma exponential
  # Any named, defined, correlation function can be passed. See Correlator documentation
  ems_gamma <- emulator_from_data(SIRSample$training, out_vars, ranges, corr_name = 'gamma_exp',
   specified_priors = list(hyper_p = list(gamma = c(0.01, 2), theta = c(1/3, 2))))

  # Multistate emulation: use the stochastic SIR dataset
  SIR_names <- c("I10", "I25", "I50", "R10", "R25", "R50")
  b_ems <- emulator_from_data(SIR_stochastic$training, SIR_names,
   ranges, emulator_type = 'multistate')

  # Covariance emulation, with specified non-zero matrix elements
  which_cov <- matrix(rep(TRUE, 16), nrow = 4)
  which_cov[2,3] <- which_cov[3,2] <- which_cov[1,4] <- which_cov[4,1] <- FALSE
  c_ems <- emulator_from_data(SIR_stochastic$training, SIR_names[-c(3,6)], ranges,
   emulator_type = 'covariance', covariance_opts = list(matrix = which_cov))

