knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Surrogates for Multi-Fidelity Problems

This package contains several surrogates that approximate the hyperparameter response surface for several interesting machine learing algorithms across several tasks.

For a continuation of this project see YAHPO GYM.

Overview

library(mlr3misc)
library(mfsurrogates)
library(data.table)
dt = data.frame(
  instance = c("nb301", "lcbench", "rbv2_svm", "rbv2_rpart", "rbv2_aknn", "rbv2_glmnet", "rbv2_ranger", "rbv2_xgboost", "rbv2_super"),
  space = c("Cat+Dep", "Numeric", "Mix+Dep", "Mix", "Mix", "Mix", "Mix+Dep", "Mix+Dep", "Mix+Dep"),
  n_dims = c(34L, 7L, 6L, 5L, 6L, 3L, 8L, 14L, 38L),
  n_targets = c("2:perf(1)+rt", "6:perf(5)+rt", "6:perf(4)+rt+pt", "6:perf(4)+rt+pt", "6:perf(4)+rt+pt", "6:perf(4)+rt+pt", "6:perf(4)+rt+pt", "6:perf(4)+rt+pt", "6:perf(4)+rt+pt"),
  fidelity = c("epoch", "epoch", "trainsize+repl", "trainsize+repl", "trainsize+repl", "trainsize+repl", "trainsize+repl", "trainsize+repl", "trainsize+repl"),
  n_problems = c(1L, 35L, 96L, 101L, 99L, 98L, 114L, 109L, 89L),
  status = "ready"
)

metrics = data.table(do.call("rbind",
  map(dt$instance, function(x) {
  x = fread(paste0(cfgs(x, workdir = workdir)$subdir, "surrogate_test_metrics.csv"))
  x = t(apply(keep(x, is.numeric), 2, range))[1:2, ]
  paste0(round(x[, 1], 2), "-", round(x[, 2], 2))
  })
))
colnames(metrics) = c("Rsq", "Rho")

dt = cbind(dt, metrics)[c(1, 2, 9, 3:8), ]

knitr::kable(dt)

| |instance |space | n_dims|n_targets |fidelity | n_problems|status |Rsq |Rho | |:--|:------------|:-------|------:|:---------------|:--------------|----------:|:------|:---------|:---------| |1 |nb301 |Cat+Dep | 34|2:perf(1)+rt |epoch | 1|ready |0.83-0.98 |0.91-0.99 | |2 |lcbench |Numeric | 7|6:perf(5)+rt |epoch | 35|ready |0.98-1 |0.99-1 | |9 |rbv2_super |Mix+Dep | 38|6:perf(4)+rt+pt |trainsize+repl | 89|ready |0.86-0.99 |0.93-0.99 | |3 |rbv2_svm |Mix+Dep | 6|6:perf(4)+rt+pt |trainsize+repl | 96|ready |0.83-0.99 |0.91-1 | |4 |rbv2_rpart |Mix | 5|6:perf(4)+rt+pt |trainsize+repl | 101|ready |0.79-0.99 |0.89-1 | |5 |rbv2_aknn |Mix | 6|6:perf(4)+rt+pt |trainsize+repl | 99|ready |0.74-0.99 |0.86-1 | |6 |rbv2_glmnet |Mix | 3|6:perf(4)+rt+pt |trainsize+repl | 98|ready |0.91-1 |0.96-1 | |7 |rbv2_ranger |Mix+Dep | 8|6:perf(4)+rt+pt |trainsize+repl | 114|ready |0.85-1 |0.92-1 | |8 |rbv2_xgboost |Mix+Dep | 14|6:perf(4)+rt+pt |trainsize+repl | 109|ready |0.87-0.98 |0.93-0.99 |

where for n_targets (#number):

For more information either look directly into the code or use the object's printers via cfgs("...").

Toy test functions:

dt = data.frame(
  instance = c("Branin", "Shekel"),
  space = c("Num", "Num"),
  n_dims = c(2L, 4L),
  n_targets = c("1:y", "1:y"),
  fidelity = c("fidelity", "fidelity"),
  n_problems = c(1L, 1L),
  status = c("ready", "ready")
)
knitr::kable(dt)

NASBench-301

We first load the config:

library(checkmate)
library(paradox)
library(mfsurrogates)
workdir = "../multifidelity_data/"
cfg = cfgs("nb301", workdir = workdir)
cfg$setup()  # automatic download of files; necessary if you didn't download manually
cfg

this config contains our objective which we can use to optimize.

library(bbotk)
library(data.table)
ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

Since NASBench is only trained on CIFAR100, we do not need to set a specific task_id here.

LCBench

We first load the config:

cfg = cfgs("lcbench", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

Subsetting to a specific task

In practice, we need to subset to a task_id if we want to tune on a specific task.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(task = "126025"),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

A list of available task_ids can be obtained from the param_set:

cfg$param_set$params$OpenML_task_id$levels
# or using the active binding across all configs:
cfg$task_levels

If the configuration only has $1$ task, NULL is returned.

Branin

We first load the config:

cfg = cfgs("branin")
#cfg$plot()  # or method = "rgl"

this config contains our objective which we can use to optimize.

ins = OptimInstanceSingleCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

Shekel

We first load the config:

cfg = cfgs("shekel")

this config contains our objective which we can use to optimize.

ins = OptimInstanceSingleCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

RandomBotv2

RandomBotv2 - svm

We first load the config:

cfg = cfgs("rbv2_svm", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

Example to select target variables and a task:

objective = cfg$get_objective(task = "3", target_variables = c("mmce", "timetrain"))
objective$codomain
objective$constants

RandomBotv2 - rpart

We first load the config:

cfg = cfgs("rbv2_rpart", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

RandomBotv2 - aknn

We first load the config:

cfg = cfgs("rbv2_aknn", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

RandomBotv2 - glmnet

We first load the config:

cfg = cfgs("rbv2_glmnet", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

RandomBotv2 - ranger

We first load the config:

cfg = cfgs("rbv2_ranger", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

RandomBotv2 - xgboost

We first load the config:

cfg = cfgs("rbv2_xgboost", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)

RandomBotv2 - super

The super learner is a learner that is parametrized as a choice over all available randombot base learners. It has a highly hierarchical, $38$ dimensional parameter space that includes the configurations of all baselearners.

We first load the config:

cfg = cfgs("rbv2_super", workdir = workdir)
cfg$setup()

this config contains our objective which we can use to optimize.

ins = OptimInstanceMultiCrit$new(
  objective = cfg$get_objective(),
  terminator = trm("evals", n_evals = 10L)
)
opt("random_search")$optimize(ins)


slds-lmu/paper_2021_multi_fidelity_surrogates documentation built on Feb. 20, 2022, 11:53 a.m.