autoxgboost: Fit and optimize a xgboost model.
In ja-thomas/autoxgboost: Automatic tuning and fitting of xgboost

Description Usage Arguments Value Examples

View source: R/autoxgboost.R

An xgboost model is optimized based on a measure (see [Measure]). The bounds of the parameter in which the model is optimized, are defined by autoxgbparset. For the optimization itself bayesian optimization with mlrMBO is used. Without any specification of the control object, the optimizer runs for for 80 iterations or 1 hour, whatever happens first. Both the parameter set and the control object can be set by the user.

autoxgboost(task, measure = NULL, control = NULL, iterations = 160L,
  time.budget = 3600L, par.set = NULL, max.nrounds = 10^6,
  early.stopping.rounds = 10L, early.stopping.fraction = 4/5,
  build.final.model = TRUE, design.size = 15L,
  impact.encoding.boundary = 10L, mbo.learner = NULL, nthread = NULL,
  tune.threshold = TRUE)

`task`	[`Task`] The task.
`measure`	[`Measure`] Performance measure. If `NULL` `getDefaultMeasure` is used.
`control`	[`MBOControl`] Control object for optimizer. If not specified, the default `makeMBOControl`] object will be used with `iterations` maximum iterations and a maximum runtime of `time.budget` seconds.
`iterations`	[`integer(1L`] Number of MBO iterations to do. Will be ignored if custom `control` is used. Default is `160`.
`time.budget`	[`integer(1L`] Time that can be used for tuning (in seconds). Will be ignored if custom `control` is used. Default is `3600`, i.e., one hour.
`par.set`	[`ParamSet`] Parameter set to tune over. Default is `autoxgbparset`.
`max.nrounds`	[`integer(1)`] Maximum number of allowed boosting iterations. Default is `10^6`.
`early.stopping.rounds`	[`integer(1L`] After how many iterations without an improvement in the boosting OOB error should be stopped? Default is `10`.
`early.stopping.fraction`	[`numeric(1)`] What fraction of the data should be used for early stopping (i.e. as a validation set). Default is `4/5`.
`build.final.model`	[`logical(1)`] Should the model with the best found configuration be refitted on the complete dataset? Default is `FALSE`.
`design.size`	[`integer(1)`] Size of the initial design. Default is `15L`.
`impact.encoding.boundary`	[`integer(1)`] Defines the threshold on how factor variables are handled. Factors with more levels than the `"impact.encoding.boundary"` get impact encoded while factor variables with less or equal levels than the `"impact.encoding.boundary"` get dummy encoded. For `impact.encoding.boundary = 0L`, all factor variables get impact encoded while for `impact.encoding.boundary = .Machine$integer.max`, all of them get dummy encoded. Default is `10`.
`mbo.learner`	[`Learner`] Regression learner from mlr, which is used as a surrogate to model our fitness function. If `NULL` (default), the default learner is determined as described here: mbo_default_learner.
`nthread`	[integer(1)] Number of cores to use. If `NULL` (default), xgboost will determine internally how many cores to use.
`tune.threshold`	[logical(1)] Should thresholds be tuned? This has only an effect for classification, see `tuneThreshold`. Default is `TRUE`.

AutoxgbResult

iris.task = makeClassifTask(data = iris, target = "Species")
ctrl = makeMBOControl()
ctrl = setMBOControlTermination(ctrl, iters = 1L) #Speed up Tuning by only doing 1 iteration
res = autoxgboost(iris.task, control = ctrl, tune.threshold = FALSE)
res