View source: R/1_Xfgpm_Class.R
fgpm_factory | R Documentation |
This function enables the smart exploration of the solution space of potential structural configurations of a funGp model, and the consequent selection of a high quality configuration. funGp currently relies on an ant colony based algorithm to perform this task. The algorithm defines the solution space based on the levels of each structural parameter currently available in the fgpm function, and performs a smart exploration of it. More details on the algorithm are provided in a dedicated technical report. funGp might evolve in the future to include improvements in the current algorithm or alternative solution methods.
fgpm_factory(
sIn = NULL,
fIn = NULL,
sOut = NULL,
ind.vl = NULL,
ctraints = list(),
setup = list(),
time.lim = Inf,
nugget = 1e-08,
n.starts = 1,
n.presample = 20,
par.clust = NULL,
trace = TRUE,
pbars = interactive()
)
sIn |
An optional matrix of scalar input values to train the model. Each column must match an input variable and each row a training point. Either scalar input coordinates (sIn), functional input coordinates (fIn), or both must be provided. |
fIn |
An optional list of functional input values to train the model. Each element of the list must be a matrix containing the set of curves corresponding to one functional input. Either scalar input coordinates (sIn), functional input coordinates (fIn), or both must be provided. |
sOut |
A vector (or 1-column matrix) containing the values of the scalar output at the specified input points. |
ind.vl |
An optional numerical matrix specifying which points in the three structures above should be used for training and which for validation. If provided, the optimization will be conducted in terms of the hold-out coefficient of determination Q², which comes from training the model with a subset of the points, and then estimating the prediction error in the remaining points. In that case, each column of ind.vl will be interpreted as one validation set, and the multiple columns will imply replicates. In the simplest case, ind.vl will be a one-column matrix or simply an array, meaning that a simple replicate should be used for each model configuration explored. If not provided, the optimization will be conducted in terms of the leave-one-out cross-validation Q², which for a total number of n observations, comes from training the model n times, each using n-1 points for training and the remaining one for validation. This procedure is typically costly due to the large number of hyperparameter optimizations that should be conducted, nonetheless, fgpm_factory implements the virtual equations introduced by Dubrule (1983) for Gaussian processes, which require a single hyperparameter optimization. See the reference below for more details. |
ctraints |
An optional list specifying the constraints of the structural optimization problem. Valid
entries for this list are: |
setup |
An optional list indicating the value for some parameters of the structural optimization
algorithm. The ant colony optimization algorithm available at this time allows the following entries: |
time.lim |
An optional number specifying a time limit in seconds to be used as stopping condition for the structural optimization. |
nugget |
An optional variance value standing for the homogeneous nugget effect. A tiny nugget might help to overcome numerical problems related to the ill-conditioning of the covariance matrix. Default is 1e-8. |
n.starts |
An optional integer indicating the number of initial points to use for the optimization of the hyperparameters. A parallel processing cluster can be exploited in order to speed up the evaluation of multiple initial points. More details in the description of the argument par.clust below. Default is 1. |
n.presample |
An optional integer indicating the number of points to be tested in order to select the
n.starts initial points. The n.presample points will be randomly sampled from the hyper-rectangle defined by: |
par.clust |
An optional parallel processing cluster created with the |
trace |
An optional boolean indicating if control messages native of the funGp package should be
printed to console. Default is TRUE. For complementary control on the display of funGp-native progress bars, have a look at
the |
pbars |
An optional boolean indicating if progress bars should be displayed. Default is TRUE. |
An object of class Xfgpm containing the data structures linked to the structural optimization
of a funGp model. It includes as the main component an object of class fgpm corresponding to the
optimized model. It is accessible through the @model
slot of the Xfgpm object.
José Betancourt, François Bachoc, Thierry Klein and Jérémy Rohmer
Betancourt, J., Bachoc, F., Klein, T., Idier, D., Rohmer, J., and Deville, Y. (2024), "funGp: An R Package for Gaussian Process Regression with Scalar and Functional Inputs". Journal of Statistical Software, 109, 5, 1–51. (\Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.18637/jss.v109.i05")})
Betancourt, J., Bachoc, F., Klein, T., Idier, D., Pedreros, R., and Rohmer, J. (2020), "Gaussian process metamodeling of functional-input code for coastal flood hazard assessment". Reliability Engineering & System Safety, 198, 106870. (\Sexpr[results=rd]{tools:::Rd_expr_doi("https://doi.org/10.1016/j.ress.2020.106870")}) [HAL]
Betancourt, J., Bachoc, F., Klein, T., and Gamboa, F. (2020), Technical Report: "Ant Colony Based Model Selection for Functional-Input Gaussian Process Regression. Ref. D3.b (WP3.2)". RISCOPE project. [HAL]
Betancourt, J., Bachoc, F., and Klein, T. (2020), R Package Manual: "Gaussian Process Regression for Scalar and Functional Inputs with funGp - The in-depth tour". RISCOPE project. [HAL]
Dubrule, O. (1983), "Cross validation of kriging in a unique neighborhood". Journal of the International Association for Mathematical Geology, 15, 687-699. [MG]
* plot,Xfgpm-method with
which = "evolution"
for visualizing the evolution of
the ACO algorithm, or with which = "diag"
for a
diagnostic plot;
* get_active_in for post-processing of input data structures following a fgpm_factory call;
* predict,fgpm-method for predictions based on a funGp model;
* simulate,fgpm-method for simulations based on a funGp model;
* update,fgpm-method for post-creation updates on a funGp model.
#construction of a fgpm object
set.seed(100)
n.tr <- 32
x1 <- x2 <- x3 <- x4 <- x5 <- seq(0,1,length = n.tr^(1/5))
sIn <- expand.grid(x1 = x1, x2 = x2, x3 = x3, x4 = x4, x5 = x5)
fIn <- list(f1 = matrix(runif(n.tr * 10), ncol = 10),
f2 = matrix(runif(n.tr * 22), ncol = 22))
sOut <- fgp_BB7(sIn, fIn, n.tr)
# optimizing the model structure with fgpm_factory (~12 seconds)
## Not run:
xm <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut)
## End(Not run)
# assessing the quality of the model
# in the absolute and also w.r.t. the other explored models
plot(xm, which = "diag")
# checking the evolution of the algorithm
plot(xm, which = "evol")
# Summary of the tested configurations
summary(xm)
# checking the log of crashed iterations
print(xm@log.crashes)
# building the model with the default fgpm arguments to compare
set.seed(100)
n.tr <- 32
x1 <- x2 <- x3 <- x4 <- x5 <- seq(0,1,length = n.tr^(1/5))
sIn <- expand.grid(x1 = x1, x2 = x2, x3 = x3, x4 = x4, x5 = x5)
fIn <- list(f1 = matrix(runif(n.tr * 10), ncol = 10),
f2 <- matrix(runif(n.tr * 22), ncol = 22))
sOut <- fgp_BB7(sIn, fIn, n.tr)
m1 <- fgpm(sIn = sIn, fIn = fIn, sOut = sOut)
plot(m1) # plotting the model
# improving performance with more iterations_______________________________________________
# call to fgpm_factory (~22 seconds)
## Not run:
xm25 <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut,
setup = list(n.iter = 25))
## End(Not run)
# assessing evolution and quality
plot(xm25, which = "evol")
plot(xm25, which = "diag")
# custom solution space____________________________________________________________________
myctr <- list(s_keepOn = c(1,2), # keep both scalar inputs always on
f_keepOn = c(2), # keep f2 always active
f_disTypes = list("2" = c("L2_byindex")), # only use L2_byindex distance for f2
f_fixDims = matrix(c(2,4), ncol = 1), # f2 projected in dimension 4
f_maxDims = matrix(c(1,5), ncol = 1), # f1 projected in dimension max 5
f_basTypes = list("1" = c("B-splines")), # only use B-splines projection for f1
kerTypes = c("matern5_2", "gauss")) # test only Matern 5/2 and Gaussian kernels
#
# call to fgpm_factory (~12 seconds)
## Not run:
xmc <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, ctraints = myctr)
## End(Not run)
# assessing evolution and quality
plot(xmc, which = "evol")
plot(xmc, which = "diag")
# verifying constraints with the log of some successfully built models
summary(xmc)
# custom heuristic parameters______________________________________________________________
mysup <- list(n.iter = 30, n.pop = 12, tao0 = .15, dop.s = 1.2,
dop.f = 1.3, delta.f = 4, dispr.f = 1.1, q0 = .85,
rho.l = .2, u.gbest = TRUE, n.ibest = 2, rho.g = .08)
# call to fgpm_factory (~20 seconds)
## Not run:
xmh <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, setup = mysup)
## End(Not run)
# verifying heuristic setup through the details of the Xfgpm object
unlist(xmh@details$param)
# stopping condition based on time_________________________________________________________
mysup <- list(n.iter = 2000)
mytlim <- 60
# call to fgpm_factory (~60 seconds)
## Not run:
xms <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut,
setup = mysup, time.lim = mytlim)
## End(Not run)
summary(xms)
## Not run:
# parallelization in the model factory_____________________________________________________
# generating input and output data
set.seed(100)
n.tr <- 243
sIn <- expand.grid(x1 = seq(0,1,length = n.tr^(1/5)), x2 = seq(0,1,length = n.tr^(1/5)),
x3 = seq(0,1,length = n.tr^(1/5)), x4 = seq(0,1,length = n.tr^(1/5)),
x5 = seq(0,1,length = n.tr^(1/5)))
fIn <- list(f1 = matrix(runif(n.tr*10), ncol = 10), f2 = matrix(runif(n.tr*22), ncol = 22))
sOut <- fgp_BB7(sIn, fIn, n.tr)
# calling fgpm_factory in parallel
cl <- parallel::makeCluster(2)
xm.par <- fgpm_factory(sIn = sIn, fIn = fIn, sOut = sOut, par.clust = cl) # (~260 seconds)
parallel::stopCluster(cl)
# NOTE: in order to provide progress bars for the monitoring of time consuming processes
# ran in parallel, funGp relies on the doFuture and future packages. Parallel processes
# suddenly interrupted by the user tend to leave corrupt connections. This problem is
# originated outside funGp, which limits our control over it. In the initial (unpublished)
# version of the funGp manual, we provide a temporary solution to the issue and we remain
# attentive in case it appears a more elegant way to handle it or a manner to suppress it.
#
# funGp original (unpublished) manual: https://hal.science/hal-02536624
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.