attic/paramset.md

title: "ParamSet Tune Short Forms" author: "Martin Binder" date: "8/14/2020" output: pdf_document

The Problem

We currently have to write a lot to define a tuning paramset. E.g. our presentation for pipelines tuning tunes a stacked learner, and the slide looks like this:

ps = ParamSet$new(list(
  ParamFct$new("branch.selection", levels = c("pca", "nothing")),
  ParamDbl$new("anova.filter.frac", lower = 0.1, upper = 1),
  ParamFct$new("lrn_branch.selection", levels = c("svm", "xgb", "rf")),
  ParamInt$new("rf.mtry", lower = 1L, upper = 20L),
  ParamInt$new("xgb.nrounds", lower = 1, upper = 500),
  ParamDbl$new("svm.cost", lower = -12, upper = 4),
  ParamDbl$new("svm.gamma", lower = -12, upper = -1)))
ps$add_dep("rf.mtry", "lrn_branch.selection", CondEqual$new("rf"))
ps$add_dep("xgb.nrounds", "lrn_branch.selection", CondEqual$new("xgb"))
ps$add_dep("svm.cost", "lrn_branch.selection", CondEqual$new("svm"))
ps$add_dep("svm.gamma", "lrn_branch.selection", CondEqual$new("svm"))
ps$trafo = function(x, param_set) {
  if (x$lrn_branch.selection == "svm") {
    x$svm.cost = 2^x$svm.cost; x$svm.gamma = 2^x$svm.gamma
  }
  return(x)
}
inst = TuningInstanceSingleCrit$new(tsk("sonar"), glrn, rsmp("cv", folds=3),
 msr("classif.ce"), ps, trm("evals", n_evals = 10))

A lot of the information here is redundant, because the GraphLearner being tuned already knows a lot about the parameters, their ranges, and their relationships. Furthermore, even writing down a new ParamSet takes much more typing than it should. In particular, trafos and dependencies are written at different places than the code which it affects. We identify three problems:

  1. Generating ParamSets is verbose
  2. Trafos and dependencies are not defined "locally"
  3. Specifying tuning ParamSets is redundant.

Teaser:

How nice would it be if we could write the ParamSet like the following:

pars <- ps(
  branch.selection = p_fct(c("pca", "nothing")),
  anova.filter.frac = p_dbl(.1, 1),
  lrn_branch.selection = p_fct(c("svm", "xgb", "rf")),
  rf.mtry = p_int(1, 20, requires = lrn_branch.selection == "rf"),
  xgb.nrounds = p_int(1, 500, requires = lrn_branch.selection == "xgb"),
  svm.cost = p_dbl(-12, 4, requires = lrn_branch.selection == "svm",
    trafo = function(x) 2^x),
  svm.gamma = p_dbl(-12, -1, requires = lrn_branch.selection == "svm",
    trafo = function(x) 2^x)
)

or, making use of the information from the pipeline ParamSet:

glrn$param_set$values = list(
  branch.selection = to_tune(),
  anova.filter.frac = to_tune(.1, 1),
  lrn_branch.selection = to_tune(),
  rf.mtry = to_tune(1, 20),
  xgb.nrounds = to_tune(1, 500),
  xgb.verbose = 0,
  svm.cost = to_tune(p_dbl(-12, 4, trafo = function(x) 2^x)),
  svm.type = "C-classification",
  svm.kernel = "radial"
)

glrn$param_set$tune_ps()
#> <ParamSet>
#>                      id    class lower upper      levels        default value
#> 1:     branch.selection ParamFct    NA    NA pca,nothing <NoDefault[3]>      
#> 2:    anova.filter.frac ParamDbl   0.1     1             <NoDefault[3]>      
#> 3:             svm.cost ParamDbl -12.0     4             <NoDefault[3]>      
#> 4:          xgb.nrounds ParamInt   1.0   500             <NoDefault[3]>      
#> 5:              rf.mtry ParamInt   1.0    20             <NoDefault[3]>      
#> 6: lrn_branch.selection ParamFct    NA    NA  svm,xgb,rf <NoDefault[3]>      
#> Trafo is set.

The Solution

I propose that these problems have solutions that are closely linked. A particular, there is one element in common to them: Parameter ranges, or what I am going to call "Domain". (This is probably close, in some way, to what set6 is doing?)

Domain

A Domain is basically a Param without an ID, but with a trafo and with dependencies. It is an auxiliary object that has not much functionality and should only be used within shortform-functions, the user should not do any computation with them.

This is a nice object to have for the combined problems above, because

  1. If we have a non-verbose constructor for Domain, then building a ParamSet from it could also be non-verbose. Granted, we can also get this by just having shortforms for ParamDbl$new() etc., but we won't get the following two benefits.
  2. We can use Domain to specify trafos and dependencies locally. Instead of defining a ParamSet and then defining its dependencies and then its trafos, we can build a ParamSet from Domains, where each domain contains the trafo and dependency that regards only itself.
  3. We can use Domain to specify tuning ranges. This is why domains should be unnamed, so they can also be used for specifying tuning ranges. Here we have the benefit that we use a Domain object for both quickly defining ParamSets and also for quickly defining tuning ranges.

We get the Domain constructors p_int(), p_dbl(), p_fct(), p_lgl(), p_uty(). We call them just like we call ParamInt$new(), except without an id, and with an optional trafo and requirements argument.

single_digits <- p_int(0, 9)
logscale <- p_dbl(log(.01), log(10), trafo = exp)
fac <- p_fct(c("polynomial", "radial"))

# There is some implicit behaviour in that the `p_fct` Domain
# automatically generates a transformation for non-character elements.
# For example, the following:
funfac <- p_fct(c("identity", "log", "exp"),
  trafo = function(x) switch(x,
    identity = identity,
    log = log,
    exp = exp
))
# is much shorter like this:
funfac <- p_fct(list(identity = identity, log = log, exp = exp))

# we can specify requirements for a Domain. Here we say that, whatever
# parameter we define with it, will depend on some "kernel" parameter
# being equal to `"polynomial"`.
degree <- p_int(1, 4,
  requires = kernel == "polynomial" && kernel2 == "polynomial")

Internals

Because people are not supposed to use Domain outside of "sugary" usage, and in particular because they should not do any computation on these objects besides "sugar", we don't need to give these objects much inner life. They are just a list(), maybe with a printer (I will avoid calling things like this "S3 objects" in this tunedocument for political reasons), with elements constructor, constargs, trafo, and requirements.

Constructing a Param from this is just mlr3misc::invoke(constructor, id = <ID>, .args = constargs); the trafo and requirements will have to be handled in a way to be given to the resulting ParamSet.

ParamSet Short Form Construction

We get a function ps() that collects Domain objects to a complete ParamSet. Its arguments must be named. This is very natural and similar to how we would write a list with named arguments etc.

pars <- ps(
  a = p_int(1, 10),
  kernel = fac,
  kernel2 = fac,
  kernel3 = fac,
  c = funfac,
  degree = degree
)
pars
#> <ParamSet>
#>         id    class lower upper            levels        default        parents value
#> 1:       a ParamInt     1    10                   <NoDefault[3]>                     
#> 2:       c ParamFct    NA    NA  identity,log,exp <NoDefault[3]>                     
#> 3:  degree ParamInt     1     4                   <NoDefault[3]> kernel,kernel2      
#> 4:  kernel ParamFct    NA    NA polynomial,radial <NoDefault[3]>                     
#> 5: kernel2 ParamFct    NA    NA polynomial,radial <NoDefault[3]>                     
#> 6: kernel3 ParamFct    NA    NA polynomial,radial <NoDefault[3]>                     
#> Trafo is set.

This is mostly implemented, see how trafo is already working:

set.seed(1)
generate_design_random(pars, 1)$transpose()
#> [[1]]
#> [[1]]$a
#> [1] 3
#> 
#> [[1]]$kernel
#> [1] "polynomial"
#> 
#> [[1]]$kernel2
#> [1] "radial"
#> 
#> [[1]]$kernel3
#> [1] "radial"
#> 
#> [[1]]$c
#> function (x) 
#> x
#> <bytecode: 0x55faab526488>
#> <environment: namespace:base>

Dependencies also work

pars$deps
#>        id      on           cond
#> 1: degree  kernel <CondEqual[9]>
#> 2: degree kernel2 <CondEqual[9]>

If a trafo is needed that goes beyond modifying single parameters, it can be given to an .extra_trafo argument to ps(). It gets executed after the Domain-local trafos.

pars <- ps(
  a = p_dbl(0, 1, trafo = exp),
  b = p_dbl(0, 1, trafo = exp),
  .extra_trafo = function(x, ps) {
    x$c <- x$a + x$b
    x
  }
)

# See how the addition happens after exp()ing:
pars$trafo(list(a = 0, b = 0))
#> $a
#> [1] 1
#> 
#> $b
#> [1] 1
#> 
#> $c
#> [1] 2

Internals

Just call the Param constructors from the Domain objects as described above. The trafo functions of each Domain are put together into one big trafo for the resulting ParamSet, and the dependencies are parsed and added.

Tune ParamSet autogeneration

When defining tuning ParamSets, we want to make use of the information stored in the Learner's ParamSet. A nice way to define a tuning scenario is if we can define the fixed and variable parameters of an object at the same time. We solve this by using a TuneToken list-with-a-printer, constructed via to_tune(). It is given to the $values slot of a ParamSet and indicates that a parameter does not have a preset value, and instead should be tuned over.

ll <- lrn("classif.rpart")
ll$param_set$values = list(
  minsplit = 10,
  cp = to_tune()
)

The ParamSet has a $tune_ps() active binding that creates the ParamSet for tuning out of this. Tuner code, like e.g. AutoTuner etc., call this and create a tuning paramset automatically:

ll$param_set$tune_ps()
#> <ParamSet>
#>    id    class lower upper levels default value
#> 1: cp ParamDbl     0     1           0.01

The Learner-side of the ParamSet must use get_values(), which will throw an error if any TuneToken objects are present in the values, since it means the Learner is being called with a parameter that should actually be tuned.

The printer of the TuneToken can indicate that the respective value of a ParamSet is to be tuned:

print(ll$param_set$values)
#> $minsplit
#> [1] 10
#> 
#> $cp
#> Tuning over:
#> <entire parameter range>

Nomenclature: We call ll$param_set$params$cp the underlying parameter, and ll$param_set$tune_ps()$cp the tuning parameter. They could have different names or types if a $trafo is involved.

TuneToken

However, maybe we do not want to tune over the full range of cp, or maybe we want to tune over integer values of a ParamDbl, or we want to tune over a ParamUty with a transformation. The to_tune() constructor therefore admits five behaviours:

  1. to_tune(): Tune over the whole range of a (bounded) Param.
  2. to_tune(lower, upper): Tune over the (numeric or integer) Param with the given bounds.
  3. to_tune(value_vector_or_list): Tune over the values in the given vector or list. This is done by creating a ParamFct tuning-Param with a trafo that converts to the type required by the underlying parameter.
  4. to_tune(Domain): Tune over the domain, making use of the given dependencies and trafos if necessary. This is useful if the type over which we tune is different from the underlying parameter being tuned. See notes below.
  5. to_tune(ParamSet): Tune over the ParamSet, making use of its trafo etc. This is useful if we tune a single (usually ParamUty) underlying parameter with multiple tuning parameters.

Notes: Why is it nice to have to_tune(Domain) instead of to_tune(lower, upper, trafo) when we need a trafo? Because

  1. This functionality is overlapping a lot with Domain already, we get two functionalities for the price of one
  2. When we give a trafo, we can often expect that the tuning parameter and the underlying parameter have different types, e.g. tuning from log(100) to log(1000) with trafo round(exp(x)) for a ParamInt underlying parameter where the tuning parameter is a ParamDbl.

The following performs tuning over the vector-valued ParamUty regularization.factor; the vector value is constructed from the reg.sepal and reg.petal tuning parameters.

round_exp = function(x) round(exp(x))  # maybe we want this in paradox

lr <- lrn("classif.ranger")
lr$param_set$values = list(
  mtry = 2,
  num.trees = to_tune(p_dbl(log(10), log(1000), trafo = round_exp)),
  regularization.factor = to_tune(
    ps(
      reg.sepal = p_dbl(0, 1),
      reg.petal = p_dbl(0, 1),
      .extra_trafo = function(x, param_set) {
        list(regularization.factor =
          c(x$reg.sepal, x$reg.sepal, x$reg.petal, x$reg.petal))
      }
    )
  )
)
lr$param_set$tune_ps()
#> <ParamSet>
#>           id    class    lower    upper levels        default value
#> 1: num.trees ParamDbl 2.302585 6.907755        <NoDefault[3]>      
#> 2: reg.sepal ParamDbl 0.000000 1.000000        <NoDefault[3]>      
#> 3: reg.petal ParamDbl 0.000000 1.000000        <NoDefault[3]>      
#> Trafo is set.

generate_design_random(lr$param_set$tune_ps(), 1)$transpose()
#> [[1]]
#> [[1]]$num.trees
#> [1] 775
#> 
#> [[1]]$regularization.factor
#> [1] 0.6607978 0.6607978 0.6291140 0.6291140

mlr3pipelines has the affect_columns parameter, which is a ParamUty that takes any object (though often a Selector object). Suppose we want to do PCA on all columns except one of the iris columns:

ts = tsk("iris")
glrn = as_learner(po("pca") %>>% lrn("classif.rpart"))
glrn$param_set$values$pca.affect_columns = to_tune(
  p_fct(ts$feature_names, trafo = function(x) selector_invert(selector_name(x)))
)

generate_design_random(glrn$param_set$tune_ps(), 1)$transpose()
#> [[1]]
#> [[1]]$pca.affect_columns
#> selector_invert(selector_name("Petal.Length"))

Internals

The ParamSet$values slot stores the TuneToken and uses that to create a tuning ParamSet whenever $tune_ps() is queried. This is done by 1. Generating a ParamSet for each individual value that is set to a TuneToken, using the information retrieved from the TuneToken (range, factor levels, etc.) and information from the Param itself to create the tuning parameter. E.g. to_tune() just clones the Param, while to_tune(p_fct(...)) needs only the $id of the Param and does some validity checking. 2. Putting the individual ParamSets gotten like this into a common ParamSet using a ps_union() function. This function goes beyond just collecting the Params, and also collects $deps and $trafo so that the individual trafos of constituent ParamSets are called correctly. Dependencies of the outer paramset are copied to $tune_ps(). 3. If we are dealing with a GraphLearner, then we are are already dealing with a ParamSetCollection on the outside (i.e. glrn$param_set is a ParamSetCollection). It must provide a $tune_ps() active binding just as ParamSet. It just puts together the individual tuning paramsets using ps_union() as well; trafos etc. get handled transparently.

Documentation

This is how the new features could be documented.

Domain

A Domain object is a representation of a single dimension of a ParamSet. Domain objects are used to construct ParamSets, either through the ps() short form, or through the ParamSet$tune_ps() mechanism. Domain corresponds to a Param object, except it does not have an id, but it does have a trafo and it does have dependencies (requires). For each of the base Param classes (ParamInt, ParamDbl, ParamLgl, ParamFct, and ParamUty) there is a function constructing a Domain object (p_int, p_dbl, p_lgl, p_fct, p_uty). They each have the same arguments as the corresponding Param $new() function, except without the id argument, and with the following additional parameters:

The p_fct function admits a levels argument that goes beyond the levels accepted by Paramfct$new(). Instead of a character vector, any atomic vector or list, optionally named, may be given. (If the value is not named, the names are inferred using as.character() on the values.) The resulting Domain will correspond to a range of values given by the names of the levels argument with a trafo that maps the character names to the arbitrary values of the levels argument.

Domain objects are representations of parameter ranges that are intermediate objects to be used in short form constructions in to_tune() and ps(). Because of their nature, they should not be modified by the user.

ps()

The ps() short form constructor uses Domain objects to construct ParamSets in a succinct and readable way. The arguments are:

TuneToken / $tune_ps()

A TuneToken object can be given to a ParamSet$values slot as an alternative to a concrete value. This indicates that the value is not given directly but should be tuned using mlr3tuning. If the thus parameterized object is invoked directly, without being wrapped by or given to a tuner, it will give an error.

The tuning range ParamSet that is constructed from the TuneToken values in a ParamSet's $values slot can be accessed through the ParamSet$tune_ps() active bindng. This is done automatically by tuners if no tuning range is given, but it is also possible to access the $tune_ps() active binding, modify it further, and give the modified ParamSet to a tuning function (or do anything else with it, noone is judging you).

A TuneToken represents the range over which the parameter whose $values slot it occupies should be tuned over. It can be constructed via the to_tune() function in one of several ways:

Challenges



mlr-org/phng documentation built on May 8, 2024, 12:05 a.m.