In ja-thomas/hyperbandr:

#knitr::opts_chunk$set(echo = TRUE)
#knitr::opts_chunk$set(cache=TRUE)

hyperbandr

This is an R6 implementation of the original hyperband algorithm https://arxiv.org/abs/1603.06560.

R6 is an encapsulated object oriented system akin to those in Java or C++, where objects contain methods in addition to data, and those methods can modify objects directly (unlike S3 and S4 which are both functional object-oriented systems, where class methods are separate from objects, and objects are not mutable).

Essentially, that means that we obtain a very generic implementation, which is working with every other R package (as long the algorithm meets the requirements of hyperband).

This tutorial contains a general introduction, four examples and a small guide to mlr:

General introduction of the mechanics of the hyperbandr package
Example 1: hyperband to optimize a neural network with mxnet and mlr (very detailed )
Example 2: hyperband in combination with MBO to optimize a **neural network with mxnet, mlr and mlrMBO
Example 3: hyperband to optimize a gradient boosting model with xgboost and mlr
Example 4: hyperband to optimize a function with smoof
appendix: introduction to mlr

\newpage

1. General introduction

In order to call hyperband, we need to define five things:

a hyperparameter search space
a function to sample configurations
a function to initialize models
a function to train models
a function to evaluate the performance of a model

1: the hyperparameter search space

We begin with the hyperparameter search space. That search space includes all hyperparameters we would like to consider, as well as a reasonable range of values for each of them.

\vspace{10pt}

mySearchSpace = ...

2: the sampling function

Following up, we need a function to sample an arbitrary amount of hyperparameter configurations from our search space.

The inputs of that function are:

par.set: the search space
n.configs: the number of configurations to sample
...: additional arguments to access the hyper storage (see example 2 how to utilize this feature to combine hyperband with MBO)

\vspace{10pt}

sample.fun = function(par.set, n.configs, ...) {
  ...
}

\vspace{10pt}

The sampling function must return a list of named lists, containing the sampled hyperparameter configurations. For instance, the structure of the return value of our sampling function for an arbitrary example should look like this:

library("ParamHelpers")

mySearchSpace = makeParamSet(
  makeDiscreteParam(id = "optimizer", values = c("sgd", "adam")),
  makeNumericParam(id = "learning.rate", lower = 0.001, upper = 0.1),
  makeLogicalParam(id = "batch.normalization")
)

sample.fun = function(par.set, n.configs, ...) {
  lapply(sampleValues(par = par.set, n = n.configs), function(x) x[!is.na(x)])
}
set.seed(6)

str(sample.fun(par.set = mySearchSpace, n.configs = 2))

\vspace{10pt}

3: the initialization function

We do also need a function to initialize our models.

The inputs of that function must include:

r: the amount of budget to initialize the model with
config: a hyperparameter configuration
problem: an object containing the data and if necessary a resampling rule

\vspace{10pt}

init.fun = function(r, config, problem) {
  ... 
}

4: the training function

The training function takes an initialized model and continues the training process. Hyperband applies successive halving and thus eliminates a bunch of models. Instead of plainly training a new model from scratch, we rather continue training our existing model. That will save us a lot of time.

Our inputs are:

mod: a model
budget: the new budget allocation
problem: an arbitrary object containing the data and if necessary a resampling rule

\vspace{10pt}

train.fun = function(mod, budget, problem) {
  ... 
}

5: the performance function

Our final ingredient is the performance function. That function simply evaluates the performance of the model at its current state.

Inputs include:

model: a model to evaluate
problem: an arbitrary object containing the data and if necessary a resampling rule

\vspace{10pt}

performance.fun = function(model, problem) {
  ... 
}

\vspace{10pt}

Now that we have defined these functions, we can finally call hyperband.

The inputs of hyperband are:

problem: an arbitrary object containing the data and if necessary a resampling rule
max.resources: the maximum amount of resource that can be allocated to a single configuration
- the default is 81, that means in particular that we sample 81 configurations in our first bracket
prop.discard: a control parameter to define the proportion of configurations that will be discarded in each round of successive halving
- the default is 3, that means in particular that we eliminate $2/3$ in each round of successive halving
max.perf: a logical indicating whether we want to maximize (e.g. accuracy) or minimize (e.g. MSE) the performance measure
id: a string generating a unique id for each model
par.set: the hyperparameter search space
sample.fun: the sampling function
train.fun: the training function
performance.fun: the performance function

\vspace{10pt}

hyperhyper = hyperband(
  problem = myProblem,
  max.resources = 81, 
  prop.discard = 3,
  max.perf = TRUE or FALSE,
  id = "my id", 
  par.set = mySearchSpace, 
  sample.fun = sample.fun,
  init.fun = init.fun,
  train.fun = train.fun, 
  performance.fun = performance.fun
)

\vspace{10pt}

We obtain according to the hyperband algorithm $floor(log_{prop.discard}(max.resources)) + 1$ brackets, which are all R6 objects. These objects contain a variety of methods, which will be discussed in the example section.

\newpage

2. Example 1: hyperband to optimize a neural network with mxnet and mlr

If you are not familiar with mlr, please go to the appendix for a short introduction.

#######################################
############## packages ###############
#######################################
setwd("C:/Users/Niklas/hyperbandr")
library("R6")
library("devtools")
load_all()
library("mxnet")
library("mlr") # you might need to install mxnet branch of mlr: devtools::install_github("mlr-org/mlr", ref = "mxnet")
library("ggplot2")
library("ggrepel")
library("gridExtra")
library("dplyr")
library("data.table")

####################################
## define the problem to optimize ##
####################################

# read mini_mnist (1/10 of actual mnist for faster evaluation, evenly distributed classes)
train = fread("mnist/train.csv", header = TRUE)
test = fread("mnist/test.csv", header = TRUE)

# Some operations to normalize features
mnist = as.data.frame(rbind(train, test))
mnist = mnist[sample(nrow(mnist)), ]
mnist[, 2:785] = lapply(mnist[, 2:785], function(x) x/255)

train.x = train[, -1]
train.x = t(train.x/255)

We would like to use a small subset of the original MNIST data (LeCun & Cortes 2010) and tune a neural network with hyperbandr.

\vspace{10pt}

plots = 64
visualize_this = sample(dim(train.x)[2], plots)
par(mfrow = c(8, plots/8), cex = 0.05)

for(i in 1:plots){
  train_vis = train.x[1:784, visualize_this[i]]
  train_mat = matrix(train_vis, nrow = 28, ncol = 28, byrow = TRUE)
  train_mat = apply(train_mat, 2 , rev)
  image(t(train_mat), axes = FALSE, col = grey(seq(from = 0, to = 1, length = 255)))
}

rm(train)
rm(train.x)
rm(test)

\vspace{10pt}

Our data has 6000 observations, evenly distributed on 10 classes.

\vspace{10pt}

dim(mnist)

\vspace{10pt}

table(mnist$label)

\vspace{10pt}

Let us create a list, which we call problem. That list should contain the data, as well as a resampling rule.

\vspace{10pt}

# We sample 2/3 of our data for training:
train.set = sample(nrow(mnist), size = (2/3)*nrow(mnist))

# Another 1/6 will be used for validation during training:
val.set = sample(setdiff(1:nrow(mnist), train.set), 1000)

# The remaining 1/6 will be stored for testing:
test.set = setdiff(1:nrow(mnist), c(train.set, val.set))

# Since we use mlr, we define a classification task to encapsulate the data:
task = makeClassifTask(data = mnist, target = "label")

# Finally, we define the problem list:
problem = list(data = task, train = train.set, val = val.set, test = test.set)

rm(mnist)

2.1: the configuration space

The ParamHelpers package provides an easy way to construct the configuration space

\vspace{10pt}

library("ParamHelpers")
# We choose to search for optimal setting of the following hyperparameters:
configSpace = makeParamSet(
  makeDiscreteParam(id = "optimizer", values = c("sgd", "rmsprop", "adam", "adagrad")),
  makeNumericParam(id = "learning.rate", lower = 0.001, upper = 0.1),
  makeNumericParam(id = "wd", lower = 0, upper = 0.01),
  makeNumericParam(id = "dropout.input", lower = 0, upper = 0.6),
  makeNumericParam(id = "dropout.layer1", lower = 0, upper = 0.6),
  makeNumericParam(id = "dropout.layer2", lower = 0, upper = 0.6),
  makeNumericParam(id = "dropout.layer3", lower = 0, upper = 0.6),
  makeLogicalParam(id = "batch.normalization1"),
  makeLogicalParam(id = "batch.normalization2"),
  makeLogicalParam(id = "batch.normalization3")
)

\vspace{10pt}

2.2: the sampling function

Now we need a function to sample configurations from our search space.

\vspace{10pt}

sample.fun = function(par.set, n.configs, ...) {
  # Sample from the par.set and remove all NAs.
  lapply(sampleValues(par = par.set, n = n.configs), function(x) x[!is.na(x)])
}

\vspace{10pt}

2.3: the initialization function

This function initializes a convolutional neural network with two conv layers as well as two dense layers. Note that we define layers = 3, the second dense layer is our output and will be automatically created by mlr. We decide to choose epochs as resources. Thus, when initializing the model, we allocate r resources or r epochs.

\vspace{10pt}

init.fun = function(r, config, problem) {
  # We begin and create a learner.
  lrn = makeLearner("classif.mxff",
    # You have to install the gpu version of mxnet in order to run this code.
    ctx = mx.gpu(),
    layers = 3, 
    conv.layer1 = TRUE, conv.layer2 = TRUE,
    conv.data.shape = c(28, 28),
    num.layer1 = 8, num.layer2 = 16, num.layer3 = 64,
    conv.kernel1 = c(3,3), conv.stride1 = c(1,1), 
    pool.kernel1 = c(2,2), pool.stride1 = c(2,2),
    conv.kernel2 = c(3,3), conv.stride2 = c(1,1), 
    pool.kernel2 = c(2,2), pool.stride2 = c(2,2),           
    array.batch.size = 128,
    begin.round = 1, num.round = r,
    # This line is very important: here we allocate the configuration to our model.
    par.vals = config
  )
  # This will start the actual training (initialization) of the model.
  mod = train(learner = lrn, task = problem$data, subset = problem$train)
  return(mod)
}

\vspace{10pt}

2.4: the training function

That function will take the initialized model and continues the training process. To this, most importantly, we have to extract the weights from our initialized model and assign them to a new learner.

\vspace{10pt}

train.fun = function(mod, budget, problem) {
  # We create a new learner and assign all hyperparameters from our initialized model.
  lrn = makeLearner("classif.mxff", ctx = mx.gpu(), par.vals = mod$learner$par.vals)
  lrn = setHyperPars(lrn,
    # In addition, we have to extract the weights and feed them into our new model .
    symbol = mod$learner.model$symbol,
    arg.params = mod$learner.model$arg.params,
    aux.params = mod$learner.model$aux.params,
    begin.round = mod$learner$par.vals$begin.round + mod$learner$par.vals$num.round,
    num.round = budget
  )
  mod = train(learner = lrn, task = problem$data, subset = problem$train)
  return(mod)
}

\vspace{10pt}

2.5: the performance function

The performance function will simply predict the validation data at each step of successive halving.

\vspace{10pt}

performance.fun = function(model, problem) {
  pred = predict(model, task = problem$data, subset = problem$val)
  # We choose accuracy as our performance measure.
  performance(pred, measures = acc)
}

\vspace{10pt}

2.6: call hyperband

Now we can call hyperband (this needs around 5 minutes on a GTX 1070).

\vspace{10pt}

hyperhyper = hyperband(
  problem = problem,
  max.resources = 81, 
  prop.discard = 3,
  max.perf = TRUE,
  id = "CNN", 
  par.set = configSpace, 
  sample.fun =  sample.fun,
  init.fun = init.fun,
  train.fun = train.fun, 
  performance.fun = performance.fun)

With max.resources = 81 and prop.discard = 3, we obtain a total of 5 brackets:

\vspace{10pt}

length(hyperhyper)

\vspace{10pt}

We can inspect the first bracket ..

\vspace{10pt}

hyperhyper[[1]]

\vspace{10pt}

.. and for instance check it's performance by calling the getPerformance() method:

\vspace{10pt}

hyperhyper[[1]]$getPerformances()

\vspace{10pt}

We can also inspect the architecture of the best model of bracket 1:

\vspace{10pt}

hyperhyper[[1]]$models[[1]]$model

\vspace{10pt}

Now let's see which bracket yielded the best performance:

\vspace{10pt}

lapply(hyperhyper, function(x) x$getPerformances())

\vspace{10pt}

We can call the hyperVis function to visualize all brackets:

\vspace{10pt}

hyperVis(hyperhyper, perfLimits = c(0, 1))

\vspace{10pt}

Let us use the best model over all brackets and predict the test data:

\vspace{10pt}

# Extract the best model:
best.mod.index = which.max(unlist(lapply(hyperhyper, function(x) x$getPerformances())))
best.mod = hyperhyper[[best.mod.index]]$models[[1]]$model

# Predict the test data:
performance(predict(object = best.mod, task = problem$data, subset = problem$test), 
            measures = acc)

\vspace{10pt}

2.7: additional features

The hyperbandr package can also compute single bracket objects. For demonstration purposes we shrink our hyperparameter search space. Computing single bracket objects requires us to input some new parameters:

s: the s'th bracket which we would like to compute
B: the (approximate) total amount of resources, that will be spend in that bracket
- the formula from the hyperband paper to compute B is B = (sMax + 1) * max.resources

\vspace{10pt}

# Smaller config space for demonstration purposes.
configSpace = makeParamSet(
  makeDiscreteParam(id = "optimizer", values = c("sgd", "adam")),
  makeNumericParam(id = "learning.rate", lower = 0.001, upper = 0.1),
  makeLogicalParam(id = "batch.normalization"))

brack = bracket$new(
  problem = problem,
  max.perf = TRUE,
  max.resources = 81,
  prop.discard = 3,
  s = 4,
  B = (4 + 1)*81,
  id = "nnet_bracket",
  par.set = configSpace,
  sample.fun = sample.fun,
  init.fun = init.fun,
  train.fun = train.fun,
  performance.fun = performance.fun)

\vspace{10pt}

Each bracket object has a bracket storage object which is basically just another R6 class. The bracket storage shows us the hyperparameters, the current budget and the performance in an equation-ish style.

\vspace{10pt}

# Currently, our bracket storage data matrix has 81 configurations (rows)
dim(brack$bracket.storage$data.matrix)

\vspace{10pt}

# Print the first 10 configurations:
brack$bracket.storage$data.matrix[1:10,]

\vspace{10pt}

We call the step() method to conduct one round of successive halving. Or just complete the bracket by calling the run() method. That means we conduct successive halving according to the rules described in the hyperband paper until only one configuration is left.

\vspace{10pt}

brack$run()

\vspace{10pt}

While we call the run() method, we continuously write new lines to our bracket storage object.

\vspace{10pt}

# Now, our bracket storage data matrix has 121 rows
dim(brack$bracket.storage$data.matrix)

\vspace{10pt}

Bracket objects have a visPerformances() method to immediately visualize the bracket.

\vspace{10pt}

brack$visPerformances()

\vspace{10pt}

Beside the graphic investigation, we can also extract the best models performance by simply calling the getPerformance() method.

\vspace{10pt}

brack$getPerformances()

\vspace{10pt}

Each bracket object contains multiple algorithm objects. The hyperbandr package allows us to create these algorithm objects soley and manipulate them. The input values are almost identical to those seen in the bracket object or when calling hyperband.

\vspace{10pt}

set.seed(1337)
myConfig = sample.fun(par.set = configSpace, n.configs = 1)[[1]]

obj = algorithm$new(
  problem = problem,
  id = "nnet",
  configuration = myConfig,
  initial.budget = 1,
  init.fun = init.fun,
  train.fun = train.fun,
  performance.fun = performance.fun)

\vspace{10pt}

We can inspect the architecture of our algorithm object by calling configuration:

\vspace{10pt}

# You can also call obj$model for much more details, but that would not fit on the page.
obj$configuration

\vspace{10pt}

Similar to the bracket object, each algorithm object has an algorithm storage object which is basically just another R6 class. The algorithm storage shows us the hyperparameters, the current budget and the performance in an equation-ish style.

\vspace{10pt}

obj$algorithm.result$data.matrix

\vspace{10pt}

The algorithm object does also have a getPerformance() method.

\vspace{10pt}

obj$getPerformance()

\vspace{10pt}

By calling the continue() method, we can continue training our algorithm object by an arbitrary amount of budget (here: epochs).

\vspace{10pt}

obj$continue(1)

\vspace{10pt}

Like before, in each step we write new lines to our algorithm storage. That enables us so track the behaviour of our algorithm object when allocating more resources.

\vspace{10pt}

obj$algorithm.result$data.matrix

\vspace{10pt}

So let us call continue(1) for 18 times to obtain a total of 20 iterations.

\vspace{10pt}

invisible(capture.output(replicate(18, obj$continue(1))))

\vspace{10pt}

This will write 18 additional lines to our algorithm storage:

\vspace{10pt}

obj$algorithm.result$data.matrix

\vspace{10pt}

To visualize the training process and the development of our validation error, we simply call the visPerformance() method:

\vspace{10pt}

obj$visPerformance()

\newpage

3. Example 2: hyperband in combination with MBO to optimize a neural network with mxnet, mlr and mlrMBO

Recall the bracket storage object of example 1:

Each bracket has a bracket storage, containing all configurations in that bracket as well as their corresponding performance values, e.g.:

\vspace{10pt}

head(brack$bracket.storage$data.matrix)

\vspace{10pt}

At each step of successive halving, we write new lines to the bracket storage object. Consequently, configurations which survived one step of successive halving occur at least two times in the bracket storage.

When we call hyperband, another R6 class called hyper storage is automatically being created. That hyper storage object takes bracket storage objects and concatenates them. Thus, the hyper storage contains the accumulated information of all configurations over all brackets, which have been computed so far.

For instance, if we begin with the third bracket, the hyper storage object contains all configurations and performance values of the first and the second bracket.

Instead of random sampling configurations in the third bracket, we could exploit the information in the hyper storage object to propose new configurations in a model based fashion (MBO).

To this we simply have to adjust our sampling function.

One potential implementation could look like this:

\vspace{10pt}

library("mlrMBO")
library("ranger")

sample.fun.mbo = function(par.set, n.configs, hyper.storage) {
  # If the hyper storage is empty, sample from our search space 
  if (dim(hyper.storage)[[1]] == 0) {
    lapply(sampleValues(par = par.set, n = n.configs), function(x) x[!is.na(x)])
  # Else, propose configurations via MBO.
  # That means, we propose configurations for the second bracket based on the
  # results from bracket one. For bracket three, we propose configurations based
  # on the results from bracket one and two and so on..
  } else {
    catf("Proposing points")
    ctrl = makeMBOControl(propose.points = n.configs)
    # Set the infill criterion
    ctrl = setMBOControlInfill(ctrl, crit = crit.cb)
    designMBO = data.table(hyper.storage)
    # We have to keep in mind, that some configurations occur multiple times.
    # Here we choose to aggregate their performance according to a rule.
    # For each configuration that occurs more than once:
    # aggregate by electing the best performance.
    designMBO = data.frame(designMBO[, max(y), by = names(configSpace$pars)])
    colnames(designMBO) = colnames(hyper.storage)[-(length(configSpace$pars) + 1)]
    # initSMBO enables us to conduct human-in-the-loop MBO
    opt.state = initSMBO(
      par.set = configSpace, 
      design = designMBO,
      control = ctrl, 
      minimize = FALSE, 
      noisy = FALSE)
    # Based on the surrogate model, proposePoints yields us our configurations
    prop = proposePoints(opt.state)
    propPoints = prop$prop.points
    rownames(propPoints) = c()
    propPoints = convertRowsToList(propPoints, name.list = FALSE, name.vector = TRUE)
    return(propPoints)
  }
}

\vspace{10pt}

Now we simply run hyperband with the new sampling function

\vspace{10pt}

hyperhyperMBO = hyperband(
  problem = problem, 
  max.resources = 81, 
  prop.discard = 3,  
  max.perf = TRUE,
  id = "CNN", 
  par.set = configSpace, 
  sample.fun =  sample.fun.mbo,
  init.fun = init.fun,
  train.fun = train.fun, 
  performance.fun = performance.fun)

\vspace{10pt}

Let us compare the results of our vanilla hyperband and the combination of hyperband with MBO:

\vspace{10pt}

hyperPlotOne = hyperVis(hyperhyper, perfLimits = c(0, 1))
hyperPlotTwo = hyperVis(hyperhyperMBO, perfLimits = c(0, 1))

grid.arrange(hyperPlotOne, hyperPlotTwo, ncol = 1, nrow = 2)

\vspace{10pt}

text

\vspace{10pt}

vanilla = matrix(unlist(lapply(hyperhyper, function(x) x$getPerformances())), 
                 ncol = 1, nrow = 5, byrow = TRUE)
mbo = matrix(unlist(lapply(hyperhyperMBO, function(x) x$getPerformances())), 
             ncol = 1, nrow = 5, byrow = TRUE)
results = data.frame(vanilla, mbo)
rownames(results) = c("bracket 4", "bracket 3", "bracket 2", "bracket 1", "bracket 0")

\vspace{10pt}

text

\newpage

4. Example 3: hyperband to optimize a gradient boosting model with xgboost and mlr

library("xgboost")

data(agaricus.train)
data(agaricus.test)

train.set = xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
test.set = xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)

problem = list(train = train.set, val = test.set)

rm(train.set)
rm(test.set)

\vspace{10pt}

text

\vspace{10pt}

configSpace = makeParamSet(
  makeIntegerParam("max_depth", lower = 3, upper = 15, default = 3),
  makeNumericParam("colsample_bytree", lower = 0.3, upper = 1, default = 0.6),
  makeNumericParam("subsample", lower = 0.3, upper = 1, default = 0.6)
)

\vspace{10pt}

text

\vspace{10pt}

sample.fun = function(par.set, n.configs, ...) {
  lapply(sampleValues(par = par.set, n = n.configs), function(x) x[!is.na(x)])
}

\vspace{10pt}

text

\vspace{10pt}

init.fun = function(r, config, problem) {
  watchlist = list(eval = problem$val, train = problem$train)
  capture.output({mod = xgb.train(config, problem$train, nrounds = r, watchlist, verbose = 1)})
  return(mod)
}

\vspace{10pt}

text

\vspace{10pt}

train.fun = function(mod, budget, problem) {
  watchlist = list(eval = problem$val, train = problem$train)
  capture.output({mod = xgb.train(xgb_model = mod, 
    nrounds = budget, params = mod$params, problem$train, watchlist, verbose = 1)})
  return(mod)
}

\vspace{10pt}

text

\vspace{10pt}

performance.fun = function(model, problem) {
  tail(model$evaluation_log$eval_rmse, n = 1)
}

\vspace{10pt}

text

\vspace{10pt}

hyperhyper = hyperband(
  problem = problem,
  max.resources = 81, 
  prop.discard = 3,  
  max.perf = FALSE,
  id = "xgboost", 
  par.set = configSpace, 
  sample.fun =  sample.fun,
  init.fun = init.fun,
  train.fun = train.fun, 
  performance.fun = performance.fun)

\vspace{10pt}

text

\vspace{10pt}

hyperVis(hyperhyper)

\vspace{10pt}

text

\vspace{10pt}

lapply(hyperhyper, function(x) x$getPerformances())

\newpage

5. Example 4: hyperband to optimize a function with smoof

library("smoof")
braninProb = makeBraninFunction()

\vspace{10pt}

text

\vspace{10pt}

opt = data.table(x1 = getGlobalOptimum(braninProb)$param$x1, x2 = getGlobalOptimum(braninProb)$param$x2)
(vis = autoplot(braninProb) + geom_point(data = opt, aes(x = x1, y = x2), shape = 20, colour = "red", size = 5))
print(braninProb)

getParamSet(braninProb)

\vspace{10pt}

text

\vspace{10pt}

configSpace = makeParamSet(
    makeNumericParam(id = "x1", lower = -5, upper = 10.1))

\vspace{10pt}

text

\vspace{10pt}

sample.fun = function(par.set, n.configs, ...) {
  sampleValues(par = par.set, n = n.configs)
}

\vspace{10pt}

text

\vspace{10pt}

init.fun = function(r, config, problem) {
  x1 = unname(unlist(config))
  x2 = runif(1, 0, 15)
  mod = c(x1, x2)
  return(mod)
}

\vspace{10pt}

text

\vspace{10pt}

train.fun = function(mod, budget, problem) {
  for(i in seq_len(budget)) {
    mod.new = c(mod[[1]], mod[[2]] + rnorm(1, sd = 3))
    if(performance.fun(mod.new) < performance.fun(mod))
      mod = mod.new
  }
  return(mod)
}

\vspace{10pt}

text

\vspace{10pt}

performance.fun = function(model, problem) {
  braninProb(c(model[[1]], model[[2]]))
}

\vspace{10pt}

text

\vspace{10pt}

hyperhyper = hyperband(
  problem = braninProb,
  max.resources = 81, 
  prop.discard = 3,  
  max.perf = FALSE,
  id = "branin", 
  par.set = configSpace, 
  sample.fun =  sample.fun,
  init.fun = init.fun,
  train.fun = train.fun, 
  performance.fun = performance.fun)

\vspace{10pt}

text

hyperVis(hyperhyper)

\vspace{10pt}

text

\vspace{10pt}

lapply(hyperhyper, function(x) x$getPerformances())

\vspace{10pt}

text

\vspace{10pt}

results = lapply(hyperhyper, function(x) x$models[[1]]$model)
data = data.frame(matrix(unlist(results), ncol = 2, byrow = TRUE))
rownames(data) = c("bracket 1", "bracket 2", "bracket 3", "bracket 4", "bracket 5")
colnames(data) = c("x1", "x2")

(vis = vis + 
  geom_point(data = data, mapping = aes(x = x1, y = x2), shape = 3, size = 3) + 
  geom_text_repel(data = data,
                  mapping = aes(x = x1, y = x2, color = factor(x1)),
                  label = rownames(data),
                  max.iter = 10000,
                  force = 3,
                  size = 4,
                  box.padding = unit(5, "lines")) + 
  theme_bw() + 
  theme(legend.position = "none")) + 
  scale_x_continuous(name = "configuration x1") +
  scale_y_continuous(name = "hyperparameter x2")