generate_new_design: Generate Proposal Points
In hmer: History Matching and Emulation Package

generate_new_design

R Documentation

Generate Proposal Points

Description

Given a set of trained emulators, this finds the next set of points that will be informative for a subsequent wave of emulation or, in the event that the current wave is the last desired, a set of points that optimally span the parameter region of interest. There are a number of different methods that can be utilised, alone or in combination with one another, to generate the points.

Usage

generate_new_design(
  ems,
  n_points,
  z,
  method = "default",
  cutoff = 3,
  plausible_set,
  verbose = interactive(),
  opts = NULL,
  ...
)

Arguments

`ems`	A list of `Emulator` objects, trained on previous design points.
`n_points`	The desired number of points to propose.
`z`	The targets to match to.
`method`	Which methods to use.
`cutoff`	The value of the cutoff to use to assess suitability.
`plausible_set`	An optional set of known non-implausible points, to avoid LHD sampling.
`verbose`	Should progress statements be printed to the console?
`opts`	A named list of opts as described.
`...`	Any parameters to pass via chaining to individual sampling functions (eg `distro` for importance sampling or `ordering` for collecting emulators).

Details

If the method argument contains 'lhs', a Latin hypercube is generated and non-implausible points from this design are retained. If more points are accepted than the next design requires, then points are subselected using a maximin argument.

If method contains 'line', then line sampling is performed. Given an already established collection of non-implausible points, rays are drawn between pairs of points (selected so as to maximise the distance between them) and more points are sampled along the rays. Points thus sampled are retained if they lie near a boundary of the non-implausible space, or on the boundary of the parameter region of interest.

If method contains 'importance', importance sampling is performed. Given a collection of non-implausible points, a mixture distribution of either multivariate normal or uniform ellipsoid proposals around the current non-implausible set are constructed. The optimal standard deviation (in the normal case) or radius (in the ellipsoid case) is determined using a burn-in phase, and points are proposed until the desired number of points have been found.

If method contains 'slice', then slice sampling is performed. Given a single known non-implausible point, a minimum enclosing hyperrectangle (perhaps after transforming the space) is determined and points are sampled for each dimension of the parameter space uniformly, shrinking the minimum enclosing hyperrectangle as appropriate. This method is akin to to a Gibbs sampler.

If method contains 'optical', then optical depth sampling is used. Given a set of non-implausible points, an approximation of the one-dimensional marginal distributions for each parameter can be determined. From these derived marginals, points are sampled and subject to rejection as in the LHD sampling.

For any sampling strategy, the parameters ems, n_points, and z must be provided. All methods rely on a means of assessing point suitability, which we refer to as an implausibility measure. By default, this uses nth-maximum implausibility as provided by nth_implausible; a user-defined method can be provided instead by supplying the function call to opts[["accept_measure"]]. Any such function must take at least five arguments: the emulators, the points, the targets, and a cutoff, as well as a ... argument to ensure compatibility with the default behaviour of the point proposal method. Note that, in accordance with the default functionality of nth_implausible, if emulating more than 10 outputs and an explicit opts$nth argument is not provided, then second-max implausibility is used as the measure.

The option opts[["seek"]] determines how many points should be chosen that have a higher probability of matching targets, as opposed to not missing targets. Due to the danger of such an approach, this value should not be too high and should be used sparingly at early waves; even at later waves, it is inadvisable to seek more than 10% of the output points using this metric. The default is seek = 0, and can be provided as either a percentage of points desired (in the range [0,1]) or the fixed number of points.

The default behaviour is as follows. A set of initial points are generated from a large LHD; line sampling is performed to find the boundaries of the space; then importance sampling is used to fill out the space. The proposed set of points are thinned and both line and importance sampling are applied again; this resampling behaviour is controlled by opts[["resample"]], where resample = n indicates that the proposal will be thinned and resampled from n times (resulting in n+1 proposal stages).

In regions where the non-implausible space at a given cutoff value is very hard to find, the point proposal will start at a higher cutoff where it can find a space-filling design. Given such a design at a higher cutoff, it can subselect to a lower cutoff by demanding some percentage of the proposed points are retained and repeat. This approach terminates if the 'ladder' of cutoffs reaches the desired cutoff, or if the process asymptotes at a particular higher cutoff. The opts ladder_tolerance and cutoff_tolerance determine the minimum improvement required in consecutive cutoffs for the process to not be considered to be asymptoting and the level of closeness to the desired cutoff at which we are prepared to stop, respectively. For instance, setting ladder_tolerance to 0.1 and cutoff_tolerance to 0.01, with a cutoff of 3, will terminate the process if two consecutive cutoffs proposed are within 0.1 of each other, or when the points proposed all have implausibility less than the 3.01.

These methods may work slowly, or not at all, if the target space is extremely small in comparison with the initial non-yet-ruled-out (NROY) space; it may also fail to give a representative sample if the target space is formed of disconnected regions of different volumes.

Value

A data.frame containing the set of new points upon which to run the model.

Arguments within `opts`

accept_measure: A custom implausibility measure to be used.
cluster: Whether to try to apply emulator clustering.
cutoff_tolerance: Tolerance for an obtained cutoff to be similar enough to that desired.
ladder_tolerance: Tolerance with which to determine if the process is asymptoting.
nth: The level of nth implausibility to apply, if using the default implausibility.
resample: How many times to perform the resampling step once points are found.
seek: How many 'good' points should be sought: either as an integer or a ratio.
to_file: If output is to be written to file periodically, the file location.
points.factor (LHS, Cluster LHS): How many more points than desired to sample.
pca_lhs (LHS): Whether to apply PCA to the space before proposing.
n_lines (Line): How many lines to draw.
ppl (Line): The number of points to sample per line.
imp_distro (Importance): The distribution to propose around points.
imp_scale (Importance): The radius, or standard deviation, of proposed distributions.
pca_slice (Slice): Whether to apply PCA to the space before slice sampling.
seek_distro (Seek): The distribution to apply when looking for 'good' points.

Examples

  # Excessive runtime
  # A simple example that uses  number of the native and ... parameter opts.
  pts <- generate_new_design(SIREmulators$ems, 100, SIREmulators$targets,
  distro = 'sphere', opts = list(resample = 0))
  # Non-default methods
  pts_slice <- generate_new_design(SIREmulators$ems, 100, SIREmulators$targets,
  method = 'slice')
  ## Example using custom measure functionality
  custom_measure <- function(ems, x, z, cutoff, ...) {
  imps_df <- nth_implausible(ems, x, z, get_raw = TRUE)
  sorted_imps <- t(apply(imps_df, 1, sort, decreasing = TRUE))
  imps1 <- sorted_imps[,1] <= cutoff
  imps2 <- sorted_imps[,2] <= cutoff - 0.5
  constraint <- apply(x, 1, function(y) y[[1]] <= 0.4)
  return(imps1 & imps2 & constraint)
  }
  pts_custom <- generate_new_design(SIREmulators$ems, 100, SIREmulators$targets,
  opts = list(accept_measure = custom_measure))

hmer documentation built on June 22, 2024, 9:22 a.m.