xgRegress: XGBoost Regression
In amanda-park/easytidymodels: Make tidymodels even easier to run!

Description Usage Arguments Details Value Examples

Runs XGBoost for regression.

xgRegress(
  gridNumber = 10,
  recipe = rec,
  folds = cvFolds,
  train = train_df,
  test = test_df,
  response = response,
  treeNum = 100,
  calcFeatImp = TRUE,
  evalMetric = "rmse"
)

`gridNumber`	Numeric. Size of the grid you want XGBoost to explore. Default is 10.
`recipe`	A recipe object.
`folds`	A rsample::vfolds_cv object.
`train`	Data frame/tibble. The training data set.
`test`	Data frame/tibble. The testing data set.
`response`	Character. The variable that is the response for analysis.
`treeNum`	Numeric. The number of trees to evaluate your model with. Default is 100.
`calcFeatImp`	Logical. Do you want to calculate feature importance for your model? If not, set = FALSE.
`evalMetric`	Character. The regression metric you want to evaluate the model's accuracy on. Default is RMSE. Can choose from the following: rmse mae rsq mase ccc icc huber_loss

What the model tunes:

mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.
min_n: The minimum number of data points in a node that are required for the node to be split further.
tree_depth: The maximum depth of the tree (i.e. number of splits).
learn_rate: The rate at which the boosting algorithm adapts from iteration-to-iteration.
loss_reduction: The reduction in the loss function required to split further.
sample_size: The amount of data exposed to the fitting routine.

What you set specifically:

trees: Default is 100. Sets the number of trees contained in the ensemble. A larger values increases runtime but (ideally) leads to more robust outcomes.

A list with the following features:

Training set predictions
Training set evaluation on RMSE and MAE
Testing set predictions
Testing set evaluation on RMSE and MAE
Feature importance plot
Feature importance table (with exact values)
Tuned model object

library(easytidymodels)
library(dplyr)
library(recipes)
utils::data(penguins, package = "modeldata")

#Define your response variable and formula object here
resp <- "bill_length_mm"
formula <- stats::as.formula(paste(resp, ".", sep="~"))

#Split data into training and testing sets
split <- trainTestSplit(penguins, responseVar = resp)

#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, split$train) %>% prep()
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)
#xgReg <- xgRegress(recipe = rec, response = resp, folds = folds,
#train = train_df, test = test_df, calcFeatImp = TRUE)

#Visualize training data and its predictions
#xgReg$trainPred %>% select(.pred, !!resp)

#View how model metrics for RMSE, R-Squared, and MAE look for training data
#xgReg$trainScore

#Visualize testing data and its predictions
#xgReg$testPred %>% select(.pred, !!resp)

#View how model metrics for RMSE, R-Squared, and MAE look for testing data
#xgReg$testScore

#See the final model chosen by XGBoost based on optimizing for your chosen evaluation metric
#xgReg$final

#See how model fit looks based on another evaluation metric
#xgReg$tune %>% tune::select_best("rmse")

#See feature importance of model
#xgReg$featImpPlot