generate_new_design | R Documentation |
Given a set of trained emulators, this finds the next set of points that will be informative for a subsequent wave of emulation or, in the event that the current wave is the last desired, a set of points that optimally span the parameter region of interest. There are a number of different methods that can be utilised, alone or in combination with one another, to generate the points.
generate_new_design(
ems,
n_points,
z,
method = "default",
cutoff = 3,
plausible_set,
verbose = interactive(),
thin = TRUE,
opts = NULL,
...
)
ems |
A list of |
n_points |
The desired number of points to propose. |
z |
The targets to match to. |
method |
Which methods to use. |
cutoff |
The value of the cutoff to use to assess suitability. |
plausible_set |
An optional set of known non-implausible points, to avoid LHD sampling. |
verbose |
Should progress statements be printed to the console? |
thin |
Should maximin sampling be applied as part of the proposal stage? |
opts |
A named list of opts as described. |
... |
Any parameters to pass via chaining to individual sampling functions (eg |
If the method
argument contains 'lhs'
, a Latin hypercube is generated and
non-implausible points from this design are retained. If more points are accepted
than the next design requires, then points are subselected using a maximin argument.
If method
contains 'line'
, then line sampling is performed. Given an
already established collection of non-implausible points, rays are drawn between
pairs of points (selected so as to maximise the distance between them) and more
points are sampled along the rays. Points thus sampled are retained if they lie
near a boundary of the non-implausible space, or on the boundary of the parameter
region of interest.
If method
contains 'importance'
, importance sampling is performed.
Given a collection of non-implausible points, a mixture distribution of either
multivariate normal or uniform ellipsoid proposals around the current non-implausible
set are constructed. The optimal standard deviation (in the normal case) or radius
(in the ellipsoid case) is determined using a burn-in phase, and points are
proposed until the desired number of points have been found.
If method
contains 'slice'
, then slice sampling is performed. Given
a single known non-implausible point, a minimum enclosing hyperrectangle (perhaps
after transforming the space) is determined and points are sampled for each dimension
of the parameter space uniformly, shrinking the minimum enclosing hyperrectangle as
appropriate. This method is akin to to a Gibbs sampler.
If method
contains 'optical'
, then optical depth sampling is used.
Given a set of non-implausible points, an approximation of the one-dimensional
marginal distributions for each parameter can be determined. From these derived
marginals, points are sampled and subject to rejection as in the LHD sampling.
For any sampling strategy, the parameters ems
, n_points
, and z
must be provided. All methods rely on a means of assessing point suitability, which
we refer to as an implausibility measure. By default, this uses nth-maximum implausibility
as provided by nth_implausible
; a user-defined method can be provided
instead by supplying the function call to opts[["accept_measure"]]
. Any
such function must take at least five arguments: the emulators, the points, the
targets, and a cutoff, as well as a ...
argument to ensure compatibility with
the default behaviour of the point proposal method. Note that, in accordance with
the default functionality of nth_implausible
, if emulating more than
10 outputs and an explicit opts$nth
argument is not provided, then second-max
implausibility is used as the measure.
The option opts[["seek"]]
determines how many points should be chosen that
have a higher probability of matching targets, as opposed to not missing targets. Due
to the danger of such an approach,
this value should not be too high and should be used sparingly at early waves;
even at later waves, it is inadvisable to seek more than 10% of the output points
using this metric. The default is seek = 0
, and can be provided as either
a percentage of points desired (in the range [0,1]) or the fixed number of points.
The default behaviour is as follows. A set of initial points are generated from a
large LHD; line sampling is performed to find the boundaries of the space; then importance
sampling is used to fill out the space. The proposed set of points are thinned and
both line and importance sampling are applied again; this resampling behaviour is
controlled by opts[["resample"]]
, where resample = n
indicates that
the proposal will be thinned and resampled from n
times (resulting in n+1
proposal stages).
In regions where the non-implausible space at a given cutoff value is very hard to find,
the point proposal will start at a higher cutoff where it can find a space-filling design.
Given such a design at a higher cutoff, it can subselect to a lower cutoff by demanding
some percentage of the proposed points are retained and repeat. This approach terminates
if the 'ladder' of cutoffs reaches the desired cutoff, or if the process asymptotes at
a particular higher cutoff. The opts ladder_tolerance
and cutoff_tolerance
determine the minimum improvement required in consecutive cutoffs for the process to not
be considered to be asymptoting and the level of closeness to the desired cutoff at which
we are prepared to stop, respectively. For instance, setting ladder_tolerance
to
0.1 and cutoff_tolerance
to 0.01, with a cutoff of 3, will terminate the process
if two consecutive cutoffs proposed are within 0.1 of each other, or when the points proposed
all have implausibility less than the 3.01.
These methods may work slowly, or not at all, if the target space is extremely small in comparison with the initial non-yet-ruled-out (NROY) space; it may also fail to give a representative sample if the target space is formed of disconnected regions of different volumes.
A data.frame containing the set of new points upon which to run the model.
opts
A custom implausibility measure to be used.
Whether to try to apply emulator clustering.
Tolerance for an obtained cutoff to be similar enough to that desired.
Tolerance with which to determine if the process is asymptoting.
The level of nth implausibility to apply, if using the default implausibility.
How many times to perform the resampling step once points are found.
How many 'good' points should be sought: either as an integer or a ratio.
If output is to be written to file periodically, the file location.
How many more points than desired to sample.
Whether to apply PCA to the space before proposing.
How many lines to draw.
The number of points to sample per line.
The distribution to propose around points.
The radius, or standard deviation, of proposed distributions.
Whether to apply PCA to the space before slice sampling.
The distribution to apply when looking for 'good' points.
# Excessive runtime
# A simple example that uses number of the native and ... parameter opts.
pts <- generate_new_design(SIREmulators$ems, 100, SIREmulators$targets,
distro = 'sphere', opts = list(resample = 0))
# Non-default methods
pts_slice <- generate_new_design(SIREmulators$ems, 100, SIREmulators$targets,
method = 'slice')
## Example using custom measure functionality
custom_measure <- function(ems, x, z, cutoff, ...) {
imps_df <- nth_implausible(ems, x, z, get_raw = TRUE)
sorted_imps <- t(apply(imps_df, 1, sort, decreasing = TRUE))
imps1 <- sorted_imps[,1] <= cutoff
imps2 <- sorted_imps[,2] <= cutoff - 0.5
constraint <- apply(x, 1, function(y) y[[1]] <= 0.4)
return(imps1 & imps2 & constraint)
}
pts_custom <- generate_new_design(SIREmulators$ems, 100, SIREmulators$targets,
opts = list(accept_measure = custom_measure))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.