knitr::opts_chunk$set( collapse = TRUE, comment = "#>" # fig.path = "Readme_files/" ) library(compboost)
[Compboost] contains two mlr3 learners for regression regr.compboost and binary classification classif.compboost.
See https://mlr3.mlr-org.com/ For an introduction to mlr3.
Here, we show the two learners in small examples.
As task, we use the Boston housing task that is accessible via tsk("boston_housing"):
library(mlr3) task = tsk("boston_housing") task
The key regr.compboost gives the regression learner:
lcb = lrn("regr.compboost") lcb$param_set lcb$train(task) lcb$model
The most important features of Compboost can be controlled via the parameters.
For example, using early stopping requires to set the value oob_fraction to a number bigger than 0.
Just in this case, the learner can be trained with early stopping:
lcb = lrn("regr.compboost", early_stop = TRUE) lcb$train(task) lcb = lrn("regr.compboost", oob_fraction = 0.3, early_stop = TRUE) lcb$train(task) head(lcb$model$logs)
Binary classification works in the same way. We use the spam data set for the demo:
task = tsk("spam") task
Then, the usual methods and fields are accessible:
lcb = lrn("classif.compboost", iterations = 500L) lcb$train(task) lcb$predict_type = "prob" pred = lcb$predict(task) pred$confusion pred$score(msr("classif.auc"))
The parallel execution in compboost is controlled by the optimizers.
With mlr3, optimizers can defined in the construction of the learner.
Thus, if compboost should be run in parallel, define an optimizer in advance and use it in the construction:
lcb$timings["train"] lcb_2c = lrn("classif.compboost", iterations = 500L, optimizer = OptimizerCoordinateDescent$new(2)) lcb_2c$train(task) lcb_2c$timings["train"]
As for the parallel execution, losses can be defined by the loss parameter value in the construction:
task = tsk("boston_housing") lcb_quantiles = lrn("regr.compboost", loss = LossQuantile$new(0.1)) lcb_quantiles$train(task) lcb_quantiles$predict(task)
Interactions can be added in the constructor by specifying a data.frame with columns feat1 and feat2.
For each row, one row-wise tensor product base learner is added to the model:
task = tsk("german_credit") ints = data.frame(feat1 = c("age", "amount"), feat2 = c("job", "duration")) ints set.seed(31415) l = lrn("classif.compboost", interactions = ints) l$train(task) l$importance() plotTensor(l$model, "amount_duration_tensor")
Early stopping is also controlled by the constructor. Use early_stop = TRUE to use
early stopping with the default values patience = 5 and eps_for_break = 0 (see ?LoggerOobRisk).
In compboost, early stopping requires a validation set and hence to set oob_fraction > 0:
task = tsk("mtcars") set.seed(314) l = lrn("regr.compboost", early_stop = TRUE, oob_fraction = 0.3, iterations = 1000) l$train(task) plotRisk(l$model)
A more aggressive early stopping is achieved by setting patience = 1:
set.seed(314) l = lrn("regr.compboost", early_stop = TRUE, oob_fraction = 0.3, iterations = 1000, patience = 1) l$train(task) plotRisk(l$model)
Though, this is not recommended as it can stop too early without reaching the best validation risk.
Note that oob_fraction > 0 must be true to use early stopping:
l = lrn("regr.compboost", early_stop = TRUE) l$train(task)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.