r descr_models("boost_tree", "lightgbm")
defaults <- tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "min_n", "loss_reduction"), default = c("see below", 100L, -1, 0.1, 20, 0)) # For this model, this is the same for all modes param <- boost_tree() %>% set_engine("lightgbm") %>% set_mode("regression") %>% make_parameter_list(defaults)
This model has r nrow(param)
tuning parameters:
param$item
The mtry
parameter gives the number of predictors that will be randomly sampled at each split. The default is to use all predictors.
Rather than as a number, [lightgbm::lgb.train()]'s feature_fraction
argument encodes mtry
as the proportion of predictors that will be randomly sampled at each split. parsnip translates mtry
, supplied as the number of predictors, to a proportion under the hood. That is, the user should still supply the argument as mtry
to boost_tree()
, and do so in its sense as a number rather than a proportion; before passing mtry
to [lightgbm::lgb.train()], parsnip will convert the mtry
value to a proportion.
Note that parsnip's translation can be overridden via the counts
argument, supplied to set_engine()
. By default, counts
is set to TRUE
, but supplying the argument counts = FALSE
allows the user to supply mtry
as a proportion rather than a number.
r uses_extension("boost_tree", "lightgbm", "regression")
boost_tree( mtry = integer(), trees = integer(), tree_depth = integer(), learn_rate = numeric(), min_n = integer(), loss_reduction = numeric() ) %>% set_engine("lightgbm") %>% set_mode("regression") %>% translate()
r uses_extension("boost_tree", "lightgbm", "classification")
boost_tree( mtry = integer(), trees = integer(), tree_depth = integer(), learn_rate = numeric(), min_n = integer(), loss_reduction = numeric() ) %>% set_engine("lightgbm") %>% set_mode("classification") %>% translate()
[bonsai::train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model.
Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.
mtry
The sample_size
argument is translated to the bagging_fraction
parameter in the param
argument of lgb.train
. The argument is interpreted by lightgbm as a proportion rather than a count, so bonsai internally reparameterizes the sample_size
argument with [dials::sample_prop()] during tuning.
To effectively enable bagging, the user would also need to set the bagging_freq
argument to lightgbm. bagging_freq
defaults to 0, which means bagging is disabled, and a bagging_freq
argument of k
means that the booster will perform bagging at every k
th boosting iteration. Thus, by default, the sample_size
argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to bagging_freq
and use k = 1
when the analogue to bagging_fraction
is in $(0, 1)$. bonsai will thus automatically set bagging_freq = 1
in set_engine("lightgbm", ...)
if sample_size
(i.e. bagging_fraction
) is not equal to 1 and no bagging_freq
value is supplied. This default can be overridden by setting the bagging_freq
argument to set_engine()
manually.
bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set quiet = TRUE
.
The "Introduction to bonsai" article contains examples of boost_tree()
with the "lightgbm"
engine.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
Kuhn, M, and K Johnson. 2013. Applied Predictive Modeling. Springer.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.