In aidantmcloughlin/hdMBO: Single and Multi-Objective High-Dimensional Model-Based Optimization (hdMBO)

library(hdMBO)

Section A: Setup and General Overview of RunMBO

A1: Setup

We'll first set up some generic design spaces:

## Only setting values to 0 to establish des. pars as numeric.
d.pars1D  <- list(x=0)
d.pars2D  <- list(x=0,y=0)
d.pars3D  <- list(x=0,y=0,z=0)
d.pars15D <- as.list(rep(0, 15))
names(d.pars15D) = paste0("d", 1:15)

## ex:
d.pars2D

And also some noisy quadratic functions. My naming convention is to end function names with ".Dx.D", where the first "." denote the dimension of the design space and the second "." denotes the dimension of the outcome space. These functions are mostly for demonstration, so they aren't too misbehaved. I will examine method performance on difficult functions later.

NOTE: The method is designed to minimize objective function values. Recall that maximizing the positive outcome is equivalent to minimizing negative outcome, if needed for your problem.

## One-dimensional outcomes:
noisyParabola1Dx1D = function(des.pars, target.fns = NULL) {
  a=rnorm(1, mean = des.pars[[1]]^2, 
          sd = 2)
  return(a)
}

noisyParabola2Dx1D = function(des.pars, target.fns = NULL) {
  a=rnorm(1, mean = des.pars[[1]]^2+des.pars[[2]]^2, 
          sd = 10)
  return(a)
  }

# High-dimensional example:
noisyParabola15Dx1D = function(des.pars, target.fns = NULL) {
  b= rnorm(1, mean = sum(unlist(des.pars)[1:8]^2) -
             sum(unlist(des.pars)[9:15]^2), sd=1)
  return(c(b))

}


## Multi-dimensional outcomes:
noisyParabola3Dx2D = function(des.pars, target.fns = NULL) {
  a=rnorm(1, mean = des.pars[[1]]^2+des.pars[[2]]^2, 
          sd = 2)
  b=rnorm(1, mean = -des.pars[[1]]^2+ des.pars[[2]]-des.pars[[3]], 
          sd = 3)
  return(c(a,b))
}

noisyParabola2Dx2D = function(des.pars, target.fns = NULL) {
  a=rnorm(1, mean = des.pars[[1]]^2+des.pars[[2]]^2, 
          sd = 2)
  b=rnorm(1, mean = -des.pars[[1]]^2+ des.pars[[2]], 
          sd = 3)
  return(c(a,b))
}


# High-dimensional example:
noisyParabola15Dx2D = function(des.pars, target.fns = NULL) {
  a= rnorm(1, mean = sum(unlist(des.pars)^2), sd=2)
  b= rnorm(1, mean = sum(unlist(des.pars)[1:8]^2) -
             sum(unlist(des.pars)[9:15]^2), sd=1)
  return(c(a,b))

}

Note that the objective functions expect to take a numeric list of design parameters.

A2: Initial Demonstration

The main workhorse function is RunMBO. It is setup to automatically run properly for single-obj vs multi-obj and high-dimensional vs low-dimensional [search space] problems. Notably, I've coded the MBO methods with an assumption that the objective function is noisy. I would still expect the optimization to work reasonable well for noiseless functions. Since our ABM models are inherently stochastic, we won't need to consider the noiseless case for the project.

Here is a quick example run:

set.seed(2020)
## This function returns a list of my default hyperparameters
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

## Here is an example of updating hyperparams as necessary:
mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "first_ex"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE #include the time in the saved file names?
## RunMBO general usage:
mbo.test_first.ex =
  RunMBO(d.pars     = d.pars1D, 
         bb.fn      = noisyParabola1Dx1D,
         hyper.pars = mbo.hyperparams)

The function messages keep track of the high-level optimization progress and also notify you when intermediate progress files are saved into our project "output" subdirectory. The optimization progress plots save in the "plots" subdirectory.

When the final optimization is complete, the intermediate files are cleared from the "output" subdirectory, and final results files and progress plots are saved.

Let's examine what a "results.mbo" object looks like:

names(mbo.test_first.ex)
names(mbo.test_first.ex$outcomes)
mbo.test_first.ex$solution
mbo.test_first.ex$solution$designs

In the single solution case, RunMBO will report a single solution according to the "best predicted" methodology. You will also notice the following solution outputs:

Index: Which function evaluation from results.mbo$outcomes corresponds to the "best predicted" point.
Iteration: During which iteration was the solution computed?
Designs: What are the design values of the solution?
Objective Evals: What is the final of a terminal evaluation at this design?
Objective Pred: What is the posterior model prediction at this design?

We would hope that the prediction and the evaluation values are close, or else our posterior model probably needs more data.

As part of the saving structure, note that we can completely recover the results.mbo object from the following file location:

rm(mbo.test_first.ex)
load("output/opt_results/demos/first_ex_FINAL.RData")
mbo.test_first.ex = results.mbo
mbo.test_first.ex$solution$designs

I can also examine intermediate plots during optimization or final plots of results:

### Histogram of Designs
knitr::include_graphics("output/opt_results/demos/plots/first_ex_FINAL_DesParsHist.png")

knitr::include_graphics("output/opt_results/demos/plots/first_ex_FINAL_MinObjVals.png")

For purposes of quicker knitting, I store demonstration results in the demos folder, and I don't rerun the MBO calls when knitting this document.

Finally, we confirm that setting a seed before the optimization run will reproduce results:

set.seed(2020)
mbo.hyperparams = s
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "reprod_first_ex"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE #include the time in the saved file names?

mbo.test_reprod.first.ex =
  RunMBO(d.pars     = d.pars1D, 
         bb.fn      = noisyParabola1Dx1D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test_first.ex)
load("output/opt_results/demos/reprod_first_ex_FINAL.RData")
mbo.test_reprod.first.ex = results.mbo
mbo.test_reprod.first.ex$solution$designs

Section B: Overview of the hyperparameters

All of the control settings and hyperparameters for RunMBO() are contained in the following object:

mbo.hyperparams = 
  SetDefaultMBOHyperPars()
names(mbo.hyperparams)

Section B1: Necessary tuning parameters

There are several hyperparameters that you should consider updating for each run. At their current settings, they are generally much lower than necessary such that we can produce quick demonstrations.

progress.upd.settings$iters.per.progress.save: How many iterations of MBO should run per progress save?

I would recommend increasing the default setting here to something like 20 or 50, to simply prevent clutter in the ouput folder, and allow you to examine meaningful updates in the optimization run.

initDesignsPerParam: How many initial designs per design parameter should be created?

The default of 4L is probably fine, but you may want to increase this value to encourage global exploration of the space for higher dimensional problems.

iterations: How many main iterations of MBO to run?

As we will discuss later with warm-starting, it isn't the end of the world if you run too few iterations. Sadly, there is no apparent "rule-of-thumb" with regard to number of iterations (or function evaluations).

In the single-objective case, for dimensions up to 10, BO tends to stop improving after around 200 iterations (200 function evaluations as well).
In the high-dimensional case, it continues to improve after 500 iterations (500 function evaluations).

The multi-objective case is less clear. For bi-objective benchmarking in the mlrMBO package's arxiv paper, they permit each algorithm to have 44*(search dim) function evaluations (where d=5).

High-dimensional multi-objective Bayesian optimization is largely unexplored.

I would encourage you to examine the intermediate results / plots of your optimization runs, and heuristically decide when the Pareto front or minimum achieved value is no longer improving substantially.

pointsPerIter: How many design proposals should be made at each iteration?

If we want to take full advantage of parallelization of objective function evalutaions, we may want to suggest several design points per iteration. There is an established method for the single objective case, and the Pareto Front outputs in the multiobjective case translate well to multipoint proposals. For the high-dim case, the single objective extension was intuitive. I developed a slight variant for the high-dim multi objective case.

For low-dimensional problems, 4-8 proposals is a reasonable choice. More proposals could improve performance for multi-objective and high-dimensional problems.

nSampleAvg: How large of a sample average of the objective function to take at each design? Ultimately, Bayesian optimization (or any optimization method) deteriorates in performance when the objective function is noisy, so it could improve performance to take sample averages (again using parallelization). We should run some tests to get a sense of the variance of the ABM model at several distinct design choices.
finalEvals: How large of a sample average of the objective function at the selected design (relevant to SO opt)? The user may prefer to take many observations at the "best predicted" point to ensure a confident final measurement of the expected outcome.
designParamBound: What number should define bounds of the numeric design space? Default is 10, since this leaves plenty of action in the resulting testing probabilities for each demographic.

Section B2: Parallelization

RunMBO() is built to parallelize: Sample averages of objective function evaluations Multi-point proposal objective function evaluations * Multi-objective post-processing via GP simulations

Parallelization runs using packages: parallel, foreach, iterators, snow, doSNOW. Function parallelEval() shows its implementation.

Here are the relevant hyperparameters to turn off or adjust parallelization:

parallelize: Logical input determines whether to do parallel evals or not.
nCores: If set to NULL, will run detectCores()-1 cores. To prevent crashes on a home computer, may want to do detectCores()-2 for your machine.

Section B3: Surrogate Function

As part of the MBO pipeline, we run a cheap surrogate function at each iteration to "guess" the solution space. For our continuous design space, we use Gaussian Process (Kriging models), as these are well covered in the BO literature. If it becomes more crucial to consider a mixed discrete-cont design space, we can push toward building a Random Forest surrogate model.

Here are the relevant Gaussian Process (GP) controls:

kernel: The kernel covariance function for the GP. Default is Matern 5/2, which imposes a twice-differentiable sample function. This tends to be the most popular selection in the literature, though the exponential and squared exponential kernels could be worth trying.
bfgs.init.pts.per.d: The DiceKriging package uses a popular hill-climbing method, BFGS, to choose GP variance and shape parameters that maximize likelihood of observed data. To escape local minima, BFGS maximizes several randomly chosen initial points. This control tells BFGS to choose max()
jitter: One downside of GPs is that they predict exactly the observed value at realized designs. Hence, we apply a small jitter to the evaluated designs before computing those predictions for the higher-level BO process.
bb.fn.noisy: Is default set to TRUE. Enforces nugget estimation of the GP and also use of an AF that incorporates the nugget estimation. Should essentially never be set to false given we are running noisy simulations.

Section B4: High Dimensional Optimization Controls

When $D>10$, the maximization of our acquisition function (AF) becomes a more expensive and more difficult problem. I implement a variant of the paper "High Dimensional Bayesian Optimization using Dropout" for our problem.
The general idea is to randomly select $d \subset D$ parameters at each iteration, optimize the AF for this reduced problem, and heuristically fill-in the remaining $D-d$ parameters to create each design.

I'll go over the method in more detail in the methods presentation. Since high-dimensional BO is still an open problem, I considered other approaches in order to benchmark, but these are not coded yet.

Here are the relevant hyperparameters:

max.d: What is the highest dimension search space for which we are willing to perform standard low dimensional BO method? Default value is 10.
high.dim.method: Which high dimensional BO method should be used? Currently the only valid option is "dropout".
high.dim.mo.pts.per.opt: How many low-dimensional proposals should be made per multi-objective AF optimization? Default value is 2.
dropouts.pars$n.dims.opt: What dimension should the dropout subspaces be? Default is 10. Anything between 5 and 15 is likely reasonable. Higher dimension => more des vars can interact, but also a harder, more expensive optimization problem.
dropouts.pars$rand.prob: With what probability should the $D-d$-dimension design be randomly selected? Higher values encourages exploration but can deteriorate convergence rate. A max of 0.2 is recommended. Even a value of 0.0 could be fine. Default value is 0.1.
dropouts.pars$hv.rank.df.factor: The $d$-dimension selections for MO optimization are sampled from the Pareto Front using the formula: $wt_i = \gamma^{(rank_i-1)/|pf|}$, where $|pf|$ is cardinality of the Pareto Front and $rank_i$ is the rank of the Pareto Front point in terms of hypervolume contribution. Hence, $\gamma=1 \implies$ all Pareto Front points have equal sample weight. $\gamma=0 \implies$ only choose the max hypervolume contributor o the Pareto Front. Default value is 1e-3.

Section B5: NSGA2 hyperparameters

For the MSPOT algorithm for multiobjective BO, the NSGA2 evolutionary algorithm is used to approximate the Pareto Front of the multi-dimensional acquisition function. This is a very popular multi-objective metaheuristic algorithm in the literature. The controls for the algorithm mostly match the defaults for the mlrMBO package, except:

pop.per.dim: I set the population size of the algorithm to be $max(100, pop.per.dim * d * o)$, just to reflect that the population size should increase for higher-dimensional problems.

I will cover the other NSGA2 hyperparameters in a subsequent presentation.

Additionally, there is a hyperparameter for the MSPOT AFs:

alpha: The AF for MSPOT is defined $\hat{\mu}-\alpha\hat{\sigma}$, denoting an "optimistically low" prediction for the surrogate function value at each point.
The default for $\alpha$ is taken from the mlrMBO package: $-qnorm(0.5 * (0.5^{1/nobj}))$. One might consider a value to encourage more exploration, such as $\alpha = 0.5$.

Section B6: Post-Processing hyperparameters

Several different control parameters are set to determine how RunMBO() will finalize a multiobjective solution. Postprocessing is run, because the observed PF is a noisy approximation of the true PF.

doPostProcessing: If false, just returns the observed PF. Otherwise, runs a post-processing procedure.
final.pf.method: Post-processing method to apply. Default is "post.simulation", which is also the only fully functioning method at the moment.
n.resample.pf: If simulating functions from the final posterior distribution, how many to simulate? Default value is 100L. More simulations equates to more confident measurements.
final.pf.random.searches.per.d: If determining a Pareto Front from posterior simulations, how many random points per search dimension should be computed? Default value is 500L. Larger random searches equates to more confident measurements.

Section B7: Progress Report settings

Finally, there are several adjustable settings related to the optimization progress reports that are generated. Most should be relatively intuitive.

progress.upd.settings$iters.per.progress.save: Explained above.
progress.upd.settings$doSaveProgPlot: Logical. Plot the optimization and design paths at intermediate saves?
progress.upd.settings$filename.tag: Unique name tag for the progress files.
progress.upd.settings$save.fildir: Relative directory for the progress files.
progress.upd.settings$save.time: Logical. Whether to save the current time in the filename. Default is TRUE.
progress.upd.settings$doSaveFinal: Logical. Whether to save the final results and delete intermediate files. Default is TRUE.
progress.upd.settings$max.cuts: How many Pareto front layers to display on the MO progress plots.
progress.upd.settings$plotdims: Plot dimensions for ggsave() for the different progress plots.

Section C: Demonstrations

Subsection C1: Demonstrations on generic functions

We don't bother tuning hyperparameters for these examples. Mostly the code chunks can demonstrate that RunMBO() will automatically run a proper method for a given High-dimensional and/or Multi-objective design. (And that all cases a reproducible)

Low Dimensional Design, Single objective

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "low_dim_sin_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations = 6

mbo.test =
  RunMBO(d.pars     = d.pars2D, 
         bb.fn      = noisyParabola2Dx1D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test)
load("output/opt_results/demos/low_dim_sin_obj_FINAL.RData")
mbo.test = results.mbo
mbo.test$solution$designs

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "reprod_low_dim_sin_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations = 6

mbo.test_reprod =
  RunMBO(d.pars     = d.pars2D, 
         bb.fn      = noisyParabola2Dx1D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test_reprod)
load("output/opt_results/demos/reprod_low_dim_sin_obj_FINAL.RData")
mbo.test_reprod = results.mbo
mbo.test_reprod$solution$designs

Low Dimensional Design, Two objectives

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "low_dim_bi_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE



mbo.hyperparams$iterations = 6

### Custom boundaries:
d.pars3D.new.bounds = 
  d.pars3D

d.pars3D.new.bounds$x = c(-3,4)
d.pars3D.new.bounds$y = c(-2,3)
d.pars3D.new.bounds$z = c(-8,8)


mbo.test =
  RunMBO(d.pars     = d.pars3D.new.bounds, 
         bb.fn      = noisyParabola3Dx2D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test)
load("output/opt_results/demos/low_dim_bi_obj_FINAL.RData")
mbo.test = results.mbo
### view pareto front objective evals:
pf.log = mbo.test$solution$obs.pf
pf.df  = mbo.test$outcomes$obj.evals[pf.log,]
head(pf.df)

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "reprod_low_dim_bi_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE

mbo.test_reprod =
  RunMBO(d.pars     = d.pars3D, 
         bb.fn      = noisyParabola3Dx2D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test_reprod)
load("output/opt_results/demos/reprod_low_dim_bi_obj_FINAL.RData")
mbo.test_reprod = results.mbo

### get pareto front:
pf.log.rep = mbo.test_reprod$solution$obs.pf
pf.df.rep  = mbo.test_reprod$outcomes$obj.evals[pf.log,]
all.equal(pf.df.rep, pf.df)

And let's examine the plots produced from the bi-objective optimization:

knitr::include_graphics("output/opt_results/demos/plots/low_dim_bi_obj_FINAL_DesParsHist.png")

knitr::include_graphics("output/opt_results/demos/plots/low_dim_bi_obj_FINAL_PFPlot.png")

rm(results.mbo, mbo.test)
load("output/opt_results/demos/low_dim_bi_obj_FINAL.RData")
mbo.test = results.mbo

plot(mbo.test$solution$sims.array$CPF.res)
#There was a bug, it should be named this when you use RunMBO: plot(mbo.test$solution$post.sim$CPF.res)
plotSymDevFun(mbo.test$solution$sims.array$CPF.res)

plot_uncertainty(list(mbo.test$gp.models$obj1, mbo.test$gp.models$obj2), 
                 paretoFront = mbo.test$outcomes$obj.evals[mbo.test$solution$obs.pf,],
                 lower = unlist(lapply(mbo.test$search.space, function(x) x[1])), 
                 upper = unlist(lapply(mbo.test$search.space, function(x) x[2])))

These plots visualize approximate Pareto Fronts as well as our observed data from the posterior simulation postprocessing. Thus far, we can only do this for two-objective problems. I am working to see if 3+ objectives is possible, and will explain more what these plots actually mean during our methods talk.

High Dimensional Design, Single Objective

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "high_dim_sin_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations                         = 5 

mbo.test =
  RunMBO(d.pars     = d.pars15D, 
         bb.fn      = noisyParabola15Dx1D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test)
load("output/opt_results/demos/high_dim_sin_obj_FINAL.RData")
mbo.test = results.mbo
mbo.test$solution$designs

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "reprod_high_dim_sin_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations                         = 5

mbo.test_reprod =
  RunMBO(d.pars     = d.pars15D, 
         bb.fn      = noisyParabola15Dx1D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test_reprod)
load("output/opt_results/demos/reprod_high_dim_sin_obj_FINAL.RData")
mbo.test_reprod = results.mbo
mbo.test_reprod$solution$designs

High Dimensional Design, Two Objectives

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "high_dim_bi_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations = 7
## post processing will take awhile for high-dimensional MO problems..
mbo.hyperparams$doPostProcessing = FALSE

mbo.test =
  RunMBO(d.pars     = d.pars15D, 
         bb.fn      = noisyParabola15Dx2D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test)
load("output/opt_results/demos/high_dim_bi_obj_FINAL.RData")
mbo.test = results.mbo
### view pareto front objective evals:
pf.log = mbo.test$solution$obs.pf
pf.df  = mbo.test$outcomes$obj.evals[pf.log,]
head(pf.df)

set.seed(2020)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "reprod_high_dim_bi_obj"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations = 7
## post processing will take awhile for high-dimensional MO problems..
mbo.hyperparams$doPostProcessing = FALSE

mbo.test_reprod =
  RunMBO(d.pars     = d.pars15D, 
         bb.fn      = noisyParabola15Dx2D,
         hyper.pars = mbo.hyperparams)

rm(results.mbo, mbo.test_reprod)
load("output/opt_results/demos/reprod_high_dim_bi_obj_FINAL.RData")
mbo.test_reprod = results.mbo

### get pareto front:
pf.log.rep = mbo.test_reprod$solution$obs.pf
pf.df.rep  = mbo.test_reprod$outcomes$obj.evals[pf.log,]
all.equal(pf.df.rep, pf.df)

Subsection C2: Warm-starting

There is a chance that a long Bayesian Optimization run will either significantly slow down or crash. RunMBO() is setup to receive a results.mbo object and continue the optimization. Hence, if you saved intermediate progress results, you can warm-start a new optimization with these evaluations.
Warm-starting can also be important if you aren't happy with the number of iterations you ran, but don't want to start the optimization over.

Make sure to manually reestablish a seed before warm-starting to ensure reproducibility over the full run. Then, simply include the old results.mbo object as an argument in RunMBO()

rm(results.mbo, mbo.test)
## load a previously completed run:
load("demo_results//low_dim_sin_obj_FINAL.RData")
mbo.old = results.mbo

## set hypers
set.seed(4040)
mbo.hyperparams = 
  SetDefaultMBOHyperPars()

mbo.hyperparams$progress.upd.settings$save.filedir = "demo_results/"
mbo.hyperparams$progress.upd.settings$filename.tag = "warm_start"
mbo.hyperparams$progress.upd.settings$save.time    = FALSE
mbo.hyperparams$iterations = 3

mbo.test =
  RunMBO(d.pars      = d.pars3D, 
         bb.fn       = noisyParabola3Dx2D,
         hyper.pars  = mbo.hyperparams,
         results.mbo = mbo.old)

Subsection C4: Using the plotting functions manually

TODO Should be a fairly straightforward process but showing examples here will be helpful.

Section D: Test functions, evaluation of results

TODO I still need to code up some helpful test functions with inexpensive evaluations.

Section F: Detailed explanations of sub-methods

TODO This subsection will include a more detailed walk-through of features wrapped up in the BO pipeline.

Subsection F1: Conditional Pareto Fronts using Random Sets Theory

Subsection F2: Approximation of GP Using Linear Random Features

# testing out GP sampler
mbo.hyperparams = 
    SetDefaultMBOHyperPars()
  mbo.test_gp.rf =   
    RunMBO(d.pars2D, bb.fn = noisyParabola2Dx1D,
           target.fns = NULL, hyper.pars = mbo.hyperparams)

  d=2

  ### Plot predictions from both methods
  t <- seq(-5,5,length=200)
  test.data <-
    data.frame(x=t#,y=rnorm(200, 0, 1)
    )

  test.data$mean = predict.km(results.mbo$gp.models$obj1,
                              newdata=test.data[,1], type="SK")$mean

  test.data$upper = predict.km(results.mbo$gp.models$obj1,
                               newdata=test.data[,1], type="SK")$upper95

  test.data$lower = predict.km(results.mbo$gp.models$obj1,
                               newdata=test.data[,1], type="SK")$lower95

  tst_fun = SampleGPwRandomFeatures(results.mbo$outcomes$designs,
                                    results.mbo$outcomes$obj.evals,
                                    results.mbo$gp.models$obj1,
                                    nFeatures = 1000)
  test.data$approx = tst_fun(x=matrix(test.data[,1]))



  ggplot(test.data %>% 
           gather("key","value", mean:approx), 
         aes(x=x, y=value,color=key))+geom_line()   

  ### Evaluate several runs of GP sampling
  datlist=
    lapply(1:5, function(x) SampleGPwRandomFeatures(results.mbo$outcomes$designs,
                                                    results.mbo$outcomes$obj.evals,
                                                    results.mbo$gp.models$obj1,
                                                    nFeatures = 1000)(matrix(test.data[,1])))
  test.data.mc=cbind(test.data, data.frame(do.call(cbind, datlist)))
  noiseless = data.frame(x=test.data.mc$x,
                         noiseless=test.data.mc$x^2)
  obs=data.frame(x=results.mbo$outcomes$designs,Obs=results.mbo$outcomes$obj.evals)

  ggplot(test.data.mc %>% dplyr::select(x,mean, upper, lower,
                                        X1:X5) %>%
           gather("key","value",mean:X5))+geom_line(aes(x=x, y=value,color=key)) + 
    geom_line(data=noiseless,aes(x=x,y=noiseless, lty = "Noiseless")) +
    geom_point(data=obs,aes(x=x,y=Obs)) + 
    geom_point(x=1.232, y = 20, color = "red")


  ggplot(test.data.mc) + geom_line(aes(x=x,y=mean)) +geom_point(x=1.232, y = 20, color = "red") +
    geom_point(data=prediction.data,aes(x=x,y=preds))

  ggplot(test.data.mc) + geom_line(aes(x=x,y=mean)) +geom_point(x=1.232, y = 20, color = "red")

  ### how about 100 features
  datlist=
    lapply(1:10, function(x) SampleGPwRandomFeatures(results.mbo$outcomes$designs,
                                                     results.mbo$outcomes$obj.evals,
                                                     results.mbo$gp.models$obj1,
                                                     nFeatures = 100)(test.data[,1:2]))
  test.data.mc=cbind(test.data, data.frame(do.call(cbind, datlist)))

  ggplot(test.data.mc %>% dplyr::filter(x >=-2.5 & x<=2.5) %>%
           dplyr::select(x,y,mean, upper, lower,
                         X1:X10) %>%
           gather("key","value",mean:X10),
         aes(x=x, y=value,color=key))+geom_line()

aidantmcloughlin/hdMBO documentation built on Dec. 31, 2020, 6:47 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

aidantmcloughlin/hdMBO
Single and Multi-Objective High-Dimensional Model-Based Optimization (hdMBO)

In aidantmcloughlin/hdMBO: Single and Multi-Objective High-Dimensional Model-Based Optimization (hdMBO)

Section A: Setup and General Overview of RunMBO

A1: Setup

A2: Initial Demonstration

Section B: Overview of the hyperparameters

Section B1: Necessary tuning parameters

Section B2: Parallelization

Section B3: Surrogate Function

Section B4: High Dimensional Optimization Controls

Section B5: NSGA2 hyperparameters

Section B6: Post-Processing hyperparameters

Section B7: Progress Report settings

Section C: Demonstrations

Subsection C1: Demonstrations on generic functions

Subsection C2: Warm-starting

Subsection C4: Using the plotting functions manually

Section D: Test functions, evaluation of results

Section F: Detailed explanations of sub-methods

Subsection F1: Conditional Pareto Fronts using Random Sets Theory

Subsection F2: Approximation of GP Using Linear Random Features

R Package Documentation

Browse R Packages

We want your feedback!

aidantmcloughlin/hdMBO Single and Multi-Objective High-Dimensional Model-Based Optimization (hdMBO)

In aidantmcloughlin/hdMBO: Single and Multi-Objective High-Dimensional Model-Based Optimization (hdMBO)

Section A: Setup and General Overview of RunMBO

A1: Setup

A2: Initial Demonstration

Section B: Overview of the hyperparameters

Section B1: Necessary tuning parameters

Section B2: Parallelization

Section B3: Surrogate Function

Section B4: High Dimensional Optimization Controls

Section B5: NSGA2 hyperparameters

Section B6: Post-Processing hyperparameters

Section B7: Progress Report settings

Section C: Demonstrations

Subsection C1: Demonstrations on generic functions

Subsection C2: Warm-starting

Subsection C4: Using the plotting functions manually

Section D: Test functions, evaluation of results

Section F: Detailed explanations of sub-methods

Subsection F1: Conditional Pareto Fronts using Random Sets Theory

Subsection F2: Approximation of GP Using Linear Random Features

R Package Documentation

Browse R Packages

We want your feedback!

aidantmcloughlin/hdMBO
Single and Multi-Objective High-Dimensional Model-Based Optimization (hdMBO)