knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) dir.create("./png")
library(lightgbm.py) library(mlbench) library(MLmetrics)
Make sure, the reticulate package is configured properly on your system (reticulate version >= 1.14) and is pointing to a python environment. If not, you can e.g. install miniconda
:
reticulate::install_miniconda( path = reticulate::miniconda_path(), update = TRUE, force = FALSE )
reticulate::py_config()
Use the function install_py_lightgbm
in order to install the lightgbm python module. This function will first look, if the reticulate package is configured well and if the python module lightgbm
is aready present. If not, it is automatically installed.
lightgbm.py::install_py_lightgbm()
The data must be provided as a data.table
object. To simplify the subsequent steps, the target column name and the ID column name are assigned to the variables target_col
and id_col
, respectively.
data("BostonHousing2") dataset <- data.table::as.data.table(BostonHousing2) target_col <- "medv" id_col <- NULL
To evaluate the model performance, the dataset is split into a training set and a test set with sklearn_train_test_split
. This function is a wrapper around python sklearn's sklearn.model_selection.train_test_split method and ensures a stratified sampling (stratified sampling does only work with categorical variables).
split <- sklearn_train_test_split( dataset, target_col, split = 0.7, seed = 17, return_only_index = TRUE, stratify = FALSE )
Initially, the LightGBM class needs to be instantiated:
lgb_learner <- LightGBM$new() lgb_learner$init_data( dataset = dataset[split$train_index, ], target_col = target_col, id_col = id_col )
Next, the learner parameters need to be set. At least, the objective
parameter needs to be provided! Almost all possible parameters have been implemented here. You can inspect them using the following command:
lgb_learner$param_set
Please refer to the LightGBM manual for further details on these parameters.
lgb_learner$param_set$values <- list( "objective" = "regression", "learning_rate" = 0.1, "seed" = 17L, "metric" = "rmse" )
The learner is now ready to be trained by using its train
function. The parameters num_boost_round
and early_stopping_rounds
can be set here. Please refer to the LightGBM manual for further details these parameters.
lgb_learner$num_boost_round <- 100 lgb_learner$early_stopping_rounds <- 10 lgb_learner$train()
The learner's predict
function returns a list object, which consists of the predicted outcomes:
predictions <- lgb_learner$predict(newdata = dataset[split$test_index, ]) head(predictions)
In order to calculate the model metrics, the test's set target variable has to be transformed accordingly to the learner's target variable's transformation:
# before transformation head(dataset[split$test_index, get(target_col)]) # use the learners transform_target-method target_test <- lgb_learner$trans_tar$transform_target( vector = dataset[split$test_index, get(target_col)], mapping = "dvalid" ) # after transformation head(target_test)
Now, several model metrics can be calculated:
MLmetrics::RMSE( y_true = target_test, y_pred = predictions ) MLmetrics::RMSLE( y_true = target_test, y_pred = predictions ) MLmetrics::MAE( y_true = target_test, y_pred = predictions )
The variable importance plot can be calculated by using the learner's importance
function:
imp <- lgb_learner$importance() imp$raw_values
filename <- "./png/lgb.py_imp_regression.png" grDevices::png( filename = filename, res = 150, height = 1000, width = 1500 ) print(imp$plot) grDevices::dev.off()
knitr::include_graphics(filename)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.