tidy xgboost"
In autostats: Auto Stats

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

options(rlang_trace_top_env = rlang::current_env())

library(autostats)
library(workflows)
library(dplyr)
library(tune)
library(rsample)
library(hardhat)

autostats provides convenient wrappers for modeling, visualizing, and predicting using a tidy workflow. The emphasis is on rapid iteration and quick results using an intuitive interface based off the tibble and tidy_formula.

Prepare data

Set up the iris data set for modeling. Create dummies and any new columns before making the formula. This way the same formula can be use throughout the modeling and prediction process.

set.seed(34)

 iris %>%
  dplyr::as_tibble() %>% 
  framecleaner::create_dummies(remove_first_dummy  = TRUE) -> iris1

 iris1 %>%
 tidy_formula(target = Petal.Length) -> petal_form

 petal_form

Use the rsample package to split into train and validation sets.

iris1 %>%
  rsample::initial_split() -> iris_split

iris_split %>%
  rsample::analysis() -> iris_train

iris_split %>%
  rsample::assessment() -> iris_val

iris_split

Fit boosting models and visualize

Fit models to the training set using the formula to predict Petal.Length. Variable importance using gain for each xgboost model can be visualized.

xgboost with bayesian hyperparameter optimization

auto_tune_xgboost returns a workflow object with tuned parameters and requires some postprocessing to get a traind xgb.Booster object like tidy_xgboost. Tuning iterations set to 1 just so the vignette builds quickly. Default is n_iter = 100

iris_train %>%
  auto_tune_xgboost(formula = petal_form, n_iter = 7L, tune_method = "bayes") -> xgb_tuned_bayes

xgb_tuned_bayes %>%
  parsnip::fit(iris_train) %>% 
  hardhat::extract_fit_engine() -> xgb_tuned_fit_bayes

xgb_tuned_fit_bayes %>% 
  visualize_model()

xgboost with grid search hyperparameter optimization

xgboost also can be tuned using a grid that is created internally using dials::grid_max_entropy. The n_iter parameter is passed to grid_size. Parallelization is highly effective in this method, so the default argument parallel = TRUE is recommended.

iris_train %>%
  auto_tune_xgboost(formula = petal_form, n_iter = 5L,trees = 20L, loss_reduction = 2, mtry = .5, tune_method = "grid", parallel = FALSE) -> xgb_tuned_grid

xgb_tuned_grid %>%
  parsnip::fit(iris_train) %>% 
  parsnip::extract_fit_engine() -> xgb_tuned_fit_grid

xgb_tuned_fit_grid %>% 
  visualize_model()

xgboost with default parameters

iris_train %>%
  tidy_xgboost(formula = petal_form) -> xgb_base

xgboost with hand-picked parameters

iris_train %>% 
  tidy_xgboost(petal_form, 
               trees = 250L, 
               tree_depth = 3L, 
               sample_size = .5,
               mtry = .5,
               min_n = 2) -> xgb_opt

predict on validation set

make predictions

Predictions are iteratively added to the validation data frame. The name of the column is automatically created using the models name and the prediction target.

xgb_base %>%
  tidy_predict(newdata = iris_val, form = petal_form) -> iris_val2

xgb_opt %>% 
  tidy_predict(newdata = iris_val2, petal_form) -> iris_val3


iris_val3 %>% 
  names()

predictions with eval_preds

Instead of evaluationg these prediction 1 by 1, This step is automated with eval_preds. This function is specifically designed to evaluate predicted columns with names given from tidy_predict.

iris_val3 %>% 
  eval_preds()

get shapley values

tidy_shap has similar syntax to tidy_predict and can be used to get shapley values from xgboost models on a validation set.

xgb_base %>% 
  tidy_shap(newdata = iris_val, form = petal_form) -> shap_list

shap_list$shap_tbl

shap_list$shap_summary

shap_list$swarmplot

shap_list$scatterplots

understand xgboost with other functions from the original package

Overfittingin the base config may be related to growing deep trees.

 xgb_base %>% 
  xgboost::xgb.plot.deepness()

 xgb_base %>% 
  xgboost::xgb.plot.deepness()

Plot the first tree in the model. The small \emph{cover} in terminal leaves suggests overfitting in the base model.

xgb_base %>% 
  xgboost::xgb.plot.tree(model = ., trees = 1)

Any scripts or data that you put into this service are public.

autostats documentation built on Nov. 10, 2022, 6:13 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

autostats
Auto Stats

tidy xgboost"
In autostats: Auto Stats

Prepare data

Fit boosting models and visualize

xgboost with bayesian hyperparameter optimization

xgboost with grid search hyperparameter optimization

xgboost with default parameters

xgboost with hand-picked parameters

predict on validation set

make predictions

predictions with eval_preds

get shapley values

understand xgboost with other functions from the original package

Try the autostats package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

autostats Auto Stats

tidy xgboost" In autostats: Auto Stats

Prepare data

Fit boosting models and visualize

xgboost with bayesian hyperparameter optimization

xgboost with grid search hyperparameter optimization

xgboost with default parameters

xgboost with hand-picked parameters

predict on validation set

make predictions

predictions with eval_preds

get shapley values

understand xgboost with other functions from the original package

Try the autostats package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

autostats
Auto Stats

tidy xgboost"
In autostats: Auto Stats