r descr_models("boost_tree", "lightgbm")

Tuning Parameters

defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "min_n",  "loss_reduction"),
                 default = c("see below", 100L, -1, 0.1, 20, 0))

# For this model, this is the same for all modes
param <-
 boost_tree() %>% 
  set_engine("lightgbm") %>% 
  set_mode("regression") %>% 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

param$item

The mtry parameter gives the number of predictors that will be randomly sampled at each split. The default is to use all predictors.

Rather than as a number, [lightgbm::lgb.train()]'s feature_fraction argument encodes mtry as the proportion of predictors that will be randomly sampled at each split. parsnip translates mtry, supplied as the number of predictors, to a proportion under the hood. That is, the user should still supply the argument as mtry to boost_tree(), and do so in its sense as a number rather than a proportion; before passing mtry to [lightgbm::lgb.train()], parsnip will convert the mtry value to a proportion.

Note that parsnip's translation can be overridden via the counts argument, supplied to set_engine(). By default, counts is set to TRUE, but supplying the argument counts = FALSE allows the user to supply mtry as a proportion rather than a number.

Translation from parsnip to the original package (regression)

r uses_extension("boost_tree", "lightgbm", "regression")

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
) %>%
  set_engine("lightgbm") %>%
  set_mode("regression") %>%
  translate()

Translation from parsnip to the original package (classification)

r uses_extension("boost_tree", "lightgbm", "classification")

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
) %>% 
  set_engine("lightgbm") %>% 
  set_mode("classification") %>% 
  translate()

[bonsai::train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model.

Other details

Preprocessing


Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.

Interpreting mtry


Saving fitted model objects


Bagging

The sample_size argument is translated to the bagging_fraction parameter in the param argument of lgb.train. The argument is interpreted by lightgbm as a proportion rather than a count, so bonsai internally reparameterizes the sample_size argument with [dials::sample_prop()] during tuning.

To effectively enable bagging, the user would also need to set the bagging_freq argument to lightgbm. bagging_freq defaults to 0, which means bagging is disabled, and a bagging_freq argument of k means that the booster will perform bagging at every kth boosting iteration. Thus, by default, the sample_size argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to bagging_freq and use k = 1 when the analogue to bagging_fraction is in $(0, 1)$. bonsai will thus automatically set bagging_freq = 1 in set_engine("lightgbm", ...) if sample_size (i.e. bagging_fraction) is not equal to 1 and no bagging_freq value is supplied. This default can be overridden by setting the bagging_freq argument to set_engine() manually.

Verbosity

bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set quiet = TRUE.

Examples

The "Introduction to bonsai" article contains examples of boost_tree() with the "lightgbm" engine.

References



topepo/parsnip documentation built on April 16, 2024, 3:23 a.m.