In topepo/parsnip: A Common API to Modeling and Analysis Functions

r descr_models("boost_tree", "lightgbm")

Tuning Parameters

defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "min_n",  "loss_reduction"),
                 default = c("see below", 100L, -1, 0.1, 20, 0))

# For this model, this is the same for all modes
param <-
 boost_tree() %>% 
  set_engine("lightgbm") %>% 
  set_mode("regression") %>% 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

param$item

The mtry parameter gives the number of predictors that will be randomly sampled at each split. The default is to use all predictors.

Rather than as a number, [lightgbm::lgb.train()]'s feature_fraction argument encodes mtry as the proportion of predictors that will be randomly sampled at each split. parsnip translates mtry, supplied as the number of predictors, to a proportion under the hood. That is, the user should still supply the argument as mtry to boost_tree(), and do so in its sense as a number rather than a proportion; before passing mtry to [lightgbm::lgb.train()], parsnip will convert the mtry value to a proportion.

Note that parsnip's translation can be overridden via the counts argument, supplied to set_engine(). By default, counts is set to TRUE, but supplying the argument counts = FALSE allows the user to supply mtry as a proportion rather than a number.

Translation from parsnip to the original package (regression)

r uses_extension("boost_tree", "lightgbm", "regression")

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
) %>%
  set_engine("lightgbm") %>%
  set_mode("regression") %>%
  translate()

Translation from parsnip to the original package (classification)

r uses_extension("boost_tree", "lightgbm", "classification")

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
) %>% 
  set_engine("lightgbm") %>% 
  set_mode("classification") %>% 
  translate()

[bonsai::train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model.

Other details

Preprocessing

Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.

Interpreting `mtry`

Bagging

The sample_size argument is translated to the bagging_fraction parameter in the param argument of lgb.train. The argument is interpreted by lightgbm as a proportion rather than a count, so bonsai internally reparameterizes the sample_size argument with [dials::sample_prop()] during tuning.

To effectively enable bagging, the user would also need to set the bagging_freq argument to lightgbm. bagging_freq defaults to 0, which means bagging is disabled, and a bagging_freq argument of k means that the booster will perform bagging at every kth boosting iteration. Thus, by default, the sample_size argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to bagging_freq and use k = 1 when the analogue to bagging_fraction is in $(0, 1)$. bonsai will thus automatically set bagging_freq = 1 in set_engine("lightgbm", ...) if sample_size (i.e. bagging_fraction) is not equal to 1 and no bagging_freq value is supplied. This default can be overridden by setting the bagging_freq argument to set_engine() manually.

Verbosity

bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set quiet = TRUE.

Sparse Data

Examples

The "Introduction to bonsai" article contains examples of boost_tree() with the "lightgbm" engine.

References

LightGBM: A Highly Efficient Gradient Boosting Decision Tree
Kuhn, M, and K Johnson. 2013. Applied Predictive Modeling. Springer.

topepo/parsnip documentation built on April 13, 2025, 12:46 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

topepo/parsnip
A Common API to Modeling and Analysis Functions

In topepo/parsnip: A Common API to Modeling and Analysis Functions

Tuning Parameters

Translation from parsnip to the original package (regression)

Translation from parsnip to the original package (classification)

Other details

Preprocessing

Interpreting `mtry`

Bagging

Verbosity

Sparse Data

Examples

References

R Package Documentation

Browse R Packages

We want your feedback!

topepo/parsnip A Common API to Modeling and Analysis Functions

In topepo/parsnip: A Common API to Modeling and Analysis Functions

Tuning Parameters

Translation from parsnip to the original package (regression)

Translation from parsnip to the original package (classification)

Other details

Preprocessing

Interpreting mtry

Bagging

Verbosity

Sparse Data

Examples

References

R Package Documentation

Browse R Packages

We want your feedback!

topepo/parsnip
A Common API to Modeling and Analysis Functions

Interpreting `mtry`