design | R Documentation |
This function implements sequential design and active learning for a (D)GP emulator or a bundle of (D)GP emulators, supporting an array of popular methods as well as user-specified approaches. It can also be used as a wrapper for Bayesian optimization methods.
design(
object,
N,
x_cand,
y_cand,
n_sample,
n_cand,
limits,
f,
reps,
freq,
x_test,
y_test,
reset,
target,
method,
batch_size,
eval,
verb,
autosave,
new_wave,
M_val,
cores,
...
)
## S3 method for class 'gp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_sample = 200,
n_cand = lifecycle::deprecated(),
limits = NULL,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
batch_size = 1,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
...
)
## S3 method for class 'dgp'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_sample = 200,
n_cand = lifecycle::deprecated(),
limits = NULL,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
batch_size = 1,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
train_N = NULL,
refit_cores = 1,
pruning = TRUE,
control = list(),
...
)
## S3 method for class 'bundle'
design(
object,
N,
x_cand = NULL,
y_cand = NULL,
n_sample = 200,
n_cand = lifecycle::deprecated(),
limits = NULL,
f = NULL,
reps = 1,
freq = c(1, 1),
x_test = NULL,
y_test = NULL,
reset = FALSE,
target = NULL,
method = vigf,
batch_size = 1,
eval = NULL,
verb = TRUE,
autosave = list(),
new_wave = TRUE,
M_val = 50,
cores = 1,
train_N = NULL,
refit_cores = 1,
...
)
See further examples and tutorials at https://mingdeyu.github.io/dgpsi-R/.
An updated object
is returned with a slot called design
that contains:
S slots, named wave1, wave2,..., waveS
, that contain information of S waves of sequential design that have been applied to the emulator.
Each slot contains the following elements:
N
, an integer that gives the numbers of iterations implemented in the corresponding wave;
rmse
, a matrix providing the evaluation metric values for emulators constructed during the corresponding wave, when eval = NULL
.
Each row of the matrix represents an iteration.
for an object
of class gp
, the matrix contains a single column of RMSE values.
for an object
of class dgp
without a categorical likelihood, each row contains mean/median squared errors corresponding to different output dimensions.
for an object
of class dgp
with a categorical likelihood, the matrix contains a single column of log-loss values.
for an object
of class bundle
, each row contains either mean/median squared errors or log-loss values for the emulators in the bundle.
metric
: a matrix providing the values of custom evaluation metrics, as computed by the user-supplied eval
function, for emulators constructed during the corresponding wave.
freq
, an integer that gives the frequency that the emulator validations are implemented during the corresponding wave.
enrichment
, a vector of size N
that gives the number of new design points added after each step of the sequential design (if object
is
an instance of the gp
or dgp
class), or a matrix that gives the number of new design points added to emulators in a bundle after each step of
the sequential design (if object
is an instance of the bundle
class).
If target
is not NULL
, the following additional elements are also included:
target
: the target evaluating metric computed by the eval
or built-in function to stop the sequential design.
reached
: indicates whether the target
was reached at the end of the sequential design:
a bool if object
is an instance of the gp
or dgp
class.
a vector of bools if object
is an instance of the bundle
class, with its length determined as follows:
equal to the number of emulators in the bundle when eval = NULL
.
equal to the length of the output from eval
when a custom eval
function is provided.
a slot called type
that gives the type of validation:
either LOO ('loo') or OOS ('oos') if eval = NULL
. See validate()
for more information about LOO and OOS.
'customized' if a customized R function is provided to eval
.
two slots called x_test
and y_test
that contain the data points for the OOS validation if the type
slot is 'oos'.
If y_cand = NULL
and x_cand
is supplied, and there are NA
s returned from the supplied f
during the sequential design, a slot called exclusion
is included
that records the located design positions that produced NA
s via f
. The sequential design will use this information to
avoid re-visiting the same locations in later runs of design()
.
See Note section below for further information.
Validation of an emulator is forced after the final step of a sequential design even if N
is not a multiple of the second element in freq
.
Any loo
or oos
slot that already exists in object
will be cleaned, and a new slot called loo
or oos
will be created in the returned object
depending on whether x_test
and y_test
are provided. The new slot gives the validation information of the emulator constructed in the final step of
the sequential design. See validate()
for more information about the slots loo
and oos
.
If object
has previously been used by design()
for sequential design, the information of the current wave of the sequential design will replace
those of old waves and be contained in the returned object, unless
the validation type (LOO or OOS depending on whether x_test
and y_test
are supplied or not) of the current wave of the sequential design is the
same as the validation types (shown in the type
of the design
slot of object
) in previous waves, and if the validation type is OOS,
x_test
and y_test
in the current wave must also be identical to those in the previous waves;
both the current and previous waves of the sequential design supply customized evaluation functions to eval
. Users need to ensure the customized evaluation
functions are consistent among different waves. Otherwise, the trace plot of RMSEs produced by draw()
will show values of different evaluation metrics in
different waves.
For the above two cases, the information of the current wave of the sequential design will be added to the design
slot of the returned object under the name waveS
.
If object
is an instance of the gp
class and eval = NULL
, the matrix in the rmse
slot is single-columned. If object
is an instance of
the dgp
or bundle
class and eval = NULL
, the matrix in the rmse
slot can have multiple columns that correspond to different output dimensions
or different emulators in the bundle.
If object
is an instance of the gp
class and eval = NULL
, target
needs to be a single value giving the RMSE threshold. If object
is an instance
of the dgp
or bundle
class and eval = NULL
, target
can be a vector of values that gives the thresholds of evaluating metrics for different output dimensions or
different emulators. If a single value is provided, it will be used as the threshold for all output dimensions (if object
is an instance of the dgp
) or all emulators
(if object
is an instance of the bundle
). If a customized function is supplied to eval
and target
is given as a vector, the user needs to ensure that the length
of target
is equal to that of the output from eval
.
When defining f
, it is important to ensure that:
the column order of the first argument of f
is consistent with the training input used for the emulator;
the column order of the output matrix of f
is consistent with the order of emulator output dimensions (if object
is an instance of the dgp
class),
or the order of emulators placed in object
(if object
is an instance of the bundle
class).
The output matrix produced by f
may include NA
s. This is especially beneficial as it allows the sequential design process to continue without interruption,
even if errors or NA
outputs are encountered from f
at certain input locations identified by the sequential design. Users should ensure that any errors
within f
are handled by appropriately returning NA
s.
When defining eval
, the output metric needs to be positive if draw()
is used with log = T
. And one needs to ensure that a lower metric value indicates
a better emulation performance if target
is set.
## Not run:
# load packages and the Python env
library(lhs)
library(dgpsi)
# construct a 2D non-stationary function that takes a matrix as the input
f <- function(x) {
sin(1/((0.7*x[,1,drop=F]+0.3)*(0.7*x[,2,drop=F]+0.3)))
}
# generate the initial design
X <- maximinLHS(5,2)
Y <- f(X)
# generate the validation data
validate_x <- maximinLHS(30,2)
validate_y <- f(validate_x)
# training a 2-layered DGP emulator with the initial design
m <- dgp(X, Y)
# specify the ranges of the input dimensions
lim_1 <- c(0, 1)
lim_2 <- c(0, 1)
lim <- rbind(lim_1, lim_2)
# 1st wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 2nd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# 3rd wave of the sequential design with 10 steps
m <- design(m, N=10, limits = lim, f = f, x_test = validate_x, y_test = validate_y)
# draw the design created by the sequential design
draw(m,'design')
# inspect the trace of RMSEs during the sequential design
draw(m,'rmse')
# reduce the number of imputations for faster OOS
m_faster <- set_imp(m, 5)
# plot the OOS validation with the faster DGP emulator
plot(m_faster, x_test = validate_x, y_test = validate_y)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.