  collapse = TRUE
  , comment = "#>"
  , warning = FALSE
  , message = FALSE


Welcome to the world of LightGBM, a highly efficient gradient boosting implementation (Ke et al. 2017).

# limit number of threads used, to be respectful of CRAN's resources when it checks this vignette

This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the bank dataset (Moro, Cortez, and Rita 2014). You will use the two input features "age" and "balance" to predict whether a client has subscribed a term deposit.

The dataset

The dataset looks as follows.

data(bank, package = "lightgbm")

bank[1L:5L, c("y", "age", "balance")]

# Distribution of the response

Training the model

The R package of LightGBM offers two functions to train a model:

Using the lightgbm() function

In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the lightgbm() function.

# Numeric response and feature matrix
y <- as.numeric(bank$y == "yes")
X <- data.matrix(bank[, c("age", "balance")])

# Train
fit <- lightgbm(
  data = X
  , label = y
  , params = list(
    num_leaves = 4L
    , learning_rate = 1.0
    , objective = "binary"
  , nrounds = 10L
  , verbose = -1L

# Result
summary(predict(fit, X))

It seems to have worked! And the predictions are indeed probabilities between 0 and 1.

Using the lgb.train() function

Alternatively, you can go for the more flexible interface lgb.train(). Here, as an additional step, you need to prepare y and X by the data API lgb.Dataset() of LightGBM. Parameters are passed to lgb.train() as a named list.

# Data interface
dtrain <- lgb.Dataset(X, label = y)

# Parameters
params <- list(
  objective = "binary"
  , num_leaves = 4L
  , learning_rate = 1.0

# Train
fit <- lgb.train(
  , data = dtrain
  , nrounds = 10L
  , verbose = -1L

Try it out! If stuck, visit LightGBM's documentation for more details.

# Cleanup
if (file.exists("lightgbm.model")) {


Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. "LightGBM: A Highly Efficient Gradient Boosting Decision Tree." In Advances in Neural Information Processing Systems 30 (NIPS 2017).

Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems 62: 22–31.

Try the lightgbm package in your browser

Any scripts or data that you put into this service are public.

lightgbm documentation built on Sept. 11, 2024, 8:44 p.m.