knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  # fig.path = "Readme_files/"
)
library(compboost)

[Compboost] contains two mlr3 learners for regression regr.compboost and binary classification classif.compboost. See https://mlr3.mlr-org.com/ For an introduction to mlr3. Here, we show the two learners in small examples.

Regression

As task, we use the Boston housing task that is accessible via tsk("boston_housing"):

library(mlr3)

task = tsk("boston_housing")
task

The key regr.compboost gives the regression learner:

lcb = lrn("regr.compboost")
lcb$param_set

lcb$train(task)
lcb$model

The most important features of Compboost can be controlled via the parameters. For example, using early stopping requires to set the value oob_fraction to a number bigger than 0. Just in this case, the learner can be trained with early stopping:

lcb = lrn("regr.compboost", early_stop = TRUE)
lcb$train(task)

lcb = lrn("regr.compboost", oob_fraction = 0.3, early_stop = TRUE)
lcb$train(task)
head(lcb$model$logs)

Binary classification

Binary classification works in the same way. We use the spam data set for the demo:

task = tsk("spam")
task

Then, the usual methods and fields are accessible:

lcb = lrn("classif.compboost", iterations = 500L)
lcb$train(task)

lcb$predict_type = "prob"
pred = lcb$predict(task)
pred$confusion
pred$score(msr("classif.auc"))

Using compboost in parallel

The parallel execution in compboost is controlled by the optimizers. With mlr3, optimizers can defined in the construction of the learner. Thus, if compboost should be run in parallel, define an optimizer in advance and use it in the construction:

lcb$timings["train"]

lcb_2c = lrn("classif.compboost", iterations = 500L, optimizer = OptimizerCoordinateDescent$new(2))
lcb_2c$train(task)
lcb_2c$timings["train"]

Using different losses

As for the parallel execution, losses can be defined by the loss parameter value in the construction:

task = tsk("boston_housing")
lcb_quantiles = lrn("regr.compboost", loss = LossQuantile$new(0.1))
lcb_quantiles$train(task)
lcb_quantiles$predict(task)

Adding interactions

Interactions can be added in the constructor by specifying a data.frame with columns feat1 and feat2. For each row, one row-wise tensor product base learner is added to the model:

task = tsk("german_credit")

ints = data.frame(feat1 = c("age", "amount"), feat2 = c("job", "duration"))
ints

set.seed(31415)
l = lrn("classif.compboost", interactions = ints)
l$train(task)
l$importance()
plotTensor(l$model, "amount_duration_tensor")

Use early stopping

Early stopping is also controlled by the constructor. Use early_stop = TRUE to use early stopping with the default values patience = 5 and eps_for_break = 0 (see ?LoggerOobRisk). In compboost, early stopping requires a validation set and hence to set oob_fraction > 0:

task = tsk("mtcars")

set.seed(314)
l = lrn("regr.compboost", early_stop = TRUE, oob_fraction = 0.3, iterations = 1000)
l$train(task)
plotRisk(l$model)

A more aggressive early stopping is achieved by setting patience = 1:

set.seed(314)
l = lrn("regr.compboost", early_stop = TRUE, oob_fraction = 0.3, iterations = 1000,
  patience = 1)
l$train(task)
plotRisk(l$model)

Though, this is not recommended as it can stop too early without reaching the best validation risk. Note that oob_fraction > 0 must be true to use early stopping:

l = lrn("regr.compboost", early_stop = TRUE)
l$train(task)


schalkdaniel/compboost documentation built on April 15, 2023, 9:03 p.m.