parsnip: A Common API to Modeling and Analysis Functions

For this engine, there are multiple modes: regression and classification

This model has 6 tuning parameters:

tree_depth: Tree Depth (type: integer, default: -1)
trees: # Trees (type: integer, default: 100)
learn_rate: Learning Rate (type: double, default: 0.1)
mtry: # Randomly Selected Predictors (type: integer, default: see below)
min_n: Minimal Node Size (type: integer, default: 20)
loss_reduction: Minimum Loss Reduction (type: double, default: 0)

The mtry parameter gives the number of predictors that will be randomly sampled at each split. The default is to use all predictors.

Rather than as a number, [lightgbm::lgb.train()]'s feature_fraction argument encodes mtry as the proportion of predictors that will be randomly sampled at each split. parsnip translates mtry, supplied as the number of predictors, to a proportion under the hood. That is, the user should still supply the argument as mtry to boost_tree(), and do so in its sense as a number rather than a proportion; before passing mtry to [lightgbm::lgb.train()], parsnip will convert the mtry value to a proportion.

Note that parsnip's translation can be overridden via the counts argument, supplied to set_engine(). By default, counts is set to TRUE, but supplying the argument counts = FALSE allows the user to supply mtry as a proportion rather than a number.

The bonsai extension package is required to fit this model.

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
) |>
  set_engine("lightgbm") |>
  set_mode("regression") |>
  translate()

## Boosted Tree Model Specification (regression)
## 
## Main Arguments:
##   mtry = integer()
##   trees = integer()
##   min_n = integer()
##   tree_depth = integer()
##   learn_rate = numeric()
##   loss_reduction = numeric()
## 
## Computational engine: lightgbm 
## 
## Model fit template:
## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), 
##     weights = missing_arg(), feature_fraction_bynode = integer(), 
##     num_iterations = integer(), min_data_in_leaf = integer(), 
##     max_depth = integer(), learning_rate = numeric(), min_gain_to_split = numeric(), 
##     verbose = -1, num_threads = 0, seed = sample.int(10^5, 1), 
##     deterministic = TRUE)

The bonsai extension package is required to fit this model.

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric()
) |> 
  set_engine("lightgbm") |> 
  set_mode("classification") |> 
  translate()

## Boosted Tree Model Specification (classification)
## 
## Main Arguments:
##   mtry = integer()
##   trees = integer()
##   min_n = integer()
##   tree_depth = integer()
##   learn_rate = numeric()
##   loss_reduction = numeric()
## 
## Computational engine: lightgbm 
## 
## Model fit template:
## bonsai::train_lightgbm(x = missing_arg(), y = missing_arg(), 
##     weights = missing_arg(), feature_fraction_bynode = integer(), 
##     num_iterations = integer(), min_data_in_leaf = integer(), 
##     max_depth = integer(), learning_rate = numeric(), min_gain_to_split = numeric(), 
##     verbose = -1, num_threads = 0, seed = sample.int(10^5, 1), 
##     deterministic = TRUE)

[bonsai::train_lightgbm()] is a wrapper around [lightgbm::lgb.train()] (and other functions) that make it easier to run this model.

This engine does not require any special encoding of the predictors. Categorical predictors can be partitioned into groups of factor levels (e.g. {a, c} vs {b, d}) when splitting at a node. Dummy variables are not required for this model.

Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.

Interpreting `mtry`

The mtry argument denotes the number of predictors that will be randomly sampled at each split when creating tree models.

Some engines, such as "xgboost", "xrf", and "lightgbm", interpret their analogue to the mtry argument as the proportion of predictors that will be randomly sampled at each split rather than the count. In some settings, such as when tuning over preprocessors that influence the number of predictors, this parameterization is quite helpful---interpreting mtry as a proportion means that [0, 1] is always a valid range for that parameter, regardless of input data.

parsnip and its extensions accommodate this parameterization using the counts argument: a logical indicating whether mtry should be interpreted as the number of predictors that will be randomly sampled at each split. TRUE indicates that mtry will be interpreted in its sense as a count, FALSE indicates that the argument will be interpreted in its sense as a proportion.

mtry is a main model argument for \code{\link[=boost_tree]{boost_tree()}} and \code{\link[=rand_forest]{rand_forest()}}, and thus should not have an engine-specific interface. So, regardless of engine, counts defaults to TRUE. For engines that support the proportion interpretation (currently "xgboost" and "xrf", via the rules package, and "lightgbm" via the bonsai package) the user can pass the counts = FALSE argument to set_engine() to supply mtry values within [0, 1].

The sample_size argument is translated to the bagging_fraction parameter in the param argument of lgb.train. The argument is interpreted by lightgbm as a proportion rather than a count, so bonsai internally reparameterizes the sample_size argument with [dials::sample_prop()] during tuning.

To effectively enable bagging, the user would also need to set the bagging_freq argument to lightgbm. bagging_freq defaults to 0, which means bagging is disabled, and a bagging_freq argument of k means that the booster will perform bagging at every kth boosting iteration. Thus, by default, the sample_size argument would be ignored without setting this argument manually. Other boosting libraries, like xgboost, do not have an analogous argument to bagging_freq and use k = 1 when the analogue to bagging_fraction is in $(0, 1)$. bonsai will thus automatically set bagging_freq = 1 in set_engine("lightgbm", ...) if sample_size (i.e. bagging_fraction) is not equal to 1 and no bagging_freq value is supplied. This default can be overridden by setting the bagging_freq argument to set_engine() manually.

bonsai quiets much of the logging output from [lightgbm::lgb.train()] by default. With default settings, logged warnings and errors will still be passed on to the user. To print out all logs during training, set quiet = TRUE.

This model can utilize sparse data during model fitting and prediction. Both sparse matrices such as dgCMatrix from the Matrix package and sparse tibbles from the sparsevctrs package are supported. See [sparse_data] for more information.

The "Introduction to bonsai" article contains examples of boost_tree() with the "lightgbm" engine.

LightGBM: A Highly Efficient Gradient Boosting Decision Tree
Kuhn, M, and K Johnson. 2013. Applied Predictive Modeling. Springer.

tidymodels/parsnip documentation built on June 2, 2025, 8:10 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tidymodels/parsnip
A Common API to Modeling and Analysis Functions

man/rmd/boost_tree_lightgbm.md
In tidymodels/parsnip: A Common API to Modeling and Analysis Functions

Tuning Parameters

Translation from parsnip to the original package (regression)

Translation from parsnip to the original package (classification)

Other details

Preprocessing

Interpreting `mtry`

Bagging

Verbosity

Sparse Data

Examples

References

R Package Documentation

Browse R Packages

We want your feedback!

tidymodels/parsnip A Common API to Modeling and Analysis Functions

man/rmd/boost_tree_lightgbm.md In tidymodels/parsnip: A Common API to Modeling and Analysis Functions

Tuning Parameters

Translation from parsnip to the original package (regression)

Translation from parsnip to the original package (classification)

Other details

Preprocessing

Interpreting mtry

Bagging

Verbosity

Sparse Data

Examples

References

R Package Documentation

Browse R Packages

We want your feedback!

tidymodels/parsnip
A Common API to Modeling and Analysis Functions

man/rmd/boost_tree_lightgbm.md
In tidymodels/parsnip: A Common API to Modeling and Analysis Functions

Interpreting `mtry`