# # Thanks at Steve Bronder for some corrections. # set.seed(123) knitr::opts_chunk$set(cache = TRUE, collapse = FALSE) library(mlrHyperopt) configureMlr(show.learner.output = FALSE)
This Vignette is covers the ParConfig
objects, what they contain and how they are created.
ParConfig
First we will create a ParConfig
so we can have a look at what we need.
ps = makeParamSet( makeIntegerParam("k", lower = 1, upper = 20) ) pc = makeParConfig( learner = "classif.knn", par.set = ps ) str(pc, 1)
Now we already saw a minimal way to create a ParConfig
out of a ParamSet
and the according learner
.
Instead of the string "classif.knn"
you can also directly pass the mlr
learner object.
We see that par.vals
and note
is not used.
par.vals
is to store fixed parameter settings that we want to use to override the defaults of the learner
.note
is just for leaving a comment that will eventually be visible online once you decide to upload your ParConfig
.ParConfig
getParConfigParSet(pc) getParConfigParVals(pc) getParConfigLearnerClass(pc) getParConfigLearnerName(pc) getParConfigLearnerType(pc) getParConfigNote(pc)
ParConfig
(pc.reg = setParConfigLearner(pc, "regr.kknn")) setParConfigLearnerType(pc.reg, "classif") setParConfigNote(pc.reg, "A note...") setParConfigParVals(pc.reg, list(scale = FALSE)) setParConfigParSet(pc.reg, makeParamSet(makeIntegerParam("k", 3, 11)))
ParConfig
We already saw how to minimally create a ParConfig
in the first example.
Let's look at more examples:
lrn = makeLearner("regr.kknn") makeParConfig( par.set = ps, learner = lrn, par.vals = list(kernel = "gaussian"), note = "This is just an example with the kernel set to 'Gaussian'." )
ParConfig
without a specific learnermlr
differentiates learners pretty strictly from their type (e.g. classification, regression, cluster etc.) although sometimes they share the same R function in the underlying package.
If we want to allow the ParConfig
to serve for classif.knn
as well as regr.knn
we have to construct it less strict like the following:
pc.less = makeParConfig( learner.name = "knn", par.set = ps ) str(pc.less, 1)
Or if you are unsure about the learner name but have the mlr
learner object:
lrn = makeLearner("classif.knn") pc.less = makeParConfig( learner.name = getLearnerName(lrn), par.set = ps ) str(pc.less, 1)
Note: The function generateParConfig
will return a ParConfig
for a given learner with a default tuning ParamSet
.
ParamSet
Most of the power of a ParConfig
lies it in the ParamSet
which is part of the ParamHelpers package.
The most important features will be explained in the following.
ParamSet
If we want to create a ParamSet
for a specific mlr
learner it is always helpful to check which parameters are available.
lrn = makeLearner("classif.ksvm") getParamSet(lrn)
Now we are facing two problems.
First, these parameters don't have finite box constraints and most tuning methods require finite box constraints.
Second, there are quite many and tuning works best when only presented the most important tuning parameters.
We will build our own ParamSet
accordingly.
The function makeParamSet
will take any parameter and create a ParamSet object, which in our example would then be used to tune the ksvm
model.
To name the most important ones
makeNumericParam(id, lower, upper)
makeIntegerParam(id, lower, upper)
makeLogicParam(id)
makeDiscreteParam(id, values)
ps.svm = makeParamSet( makeNumericParam("C", lower = 0, upper = 100), makeDiscreteParam("kernel", values = c("polydot","rbfdot")) )
Attention!
Here we see the first problem:
The parameter C is more sensitive to changes for values around zero.
We will use the trafo
argument of makeNumericParam()
so that our search space for C
accounts for the sensitivity near zero.
ParamSet
with a transformationps.svm.trafo = makeParamSet( makeNumericParam("C", lower = -5, upper = 7, trafo = function(x) 2^x), makeDiscreteParam("kernel", values = c("polydot","rbfdot")) )
Let's compare randomly drawn values:
s1 = sampleValues(ps.svm, n = 100) s2 = sampleValues(ps.svm.trafo, n = 100, trafo = TRUE) op = par(mfrow = c(1,2)) hist(BBmisc::extractSubList(s1, "C")) hist(BBmisc::extractSubList(s2, "C")) par(op)
As transformations can be arbitrary functions they can be used for other useful purposes as only generating uneven numbers, which makes sense for knn
classification to not have ties:
ps.knn = makeParamSet( makeNumericParam("k", lower = 1, upper = 6, trafo = function(x) 2*x-1) )
ParamSet
with dependent / hierarchical parametersFor our SVM example we actually would like to tune the parameter sigma for the rbfdot
kernel and the degree for the polydot
kernel.
So the sigma parameter should only be active when kernel is set to rbfdot
and degree should only be active for kernel == "polydot"
.
To model such dependencies or hierarchical structures in the parameter space all make*Param
functions have the requires
argument which can be used like follows:
ps.svm.req = makeParamSet( makeNumericParam("C", lower = -5, upper = 7, trafo = function(x) 2^x), makeDiscreteParam("kernel", values = c("polydot","rbfdot")), makeNumericParam("sigma", lower = -5, upper = 5, trafo = function(x) 2^x, requires = quote(kernel == "rbfdot")), makeIntegerParam("degree", lower = 1, upper = 5, requires = quote(kernel == "polydot")) )
Let's generate a LHS design to see the effects of the requirements:
generateDesign(6, ps.svm.req)
ParamSet
with data dependent parameter spacesFor some learners the tuning space varies from the data presented.
A prominent example is the mtry parameter of the randomForest
which determines how many randomly drawn variables are to be considered in every split.
The default is sqrt(p)
with p
being the number of variables in the data.
Naturally we might want to set the boundaries for that value around that default.
This is possible using expressions like in the following example:
ps.rf = makeParamSet( makeIntegerParam("mtry", lower = expression(floor(sqrt(p*0.25))), upper = expression(ceiling(sqrt(p*0.75)))) )
Which variables can I use in the expressions?
getTaskDictionary(task = iris.task)
p
number of features / variables in xn.task
number of observations in the tasktype
type of the task like classif
, regr
, cluster
and surv
.n
number of observations in the subsetk
number of classes in targettask
the complete task objectAttention:
This feature is not implemented in mlr yet.
As a consequence, the expressions have to be pre-evaluated before they can be used for tuning.
This also means that n
will always be the tasks data set size instead of the number of observations after a cross-validation split.
Feature selection will not affect p
.
To convert the ParamSet
with expressions to a normal ParamSet
we call the following:
evaluateParamExpressions(ps.rf, dict = getTaskDictionary(iris.task)) evaluateParamExpressions(ps.rf, dict = list(p = 100, n = 1000))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.