Description Usage Arguments Details Value Examples
View source: R/xgMultiClassif.R
Runs XGBoost for multiclass classification.
1 2 3 4 5 6 7 8 9 10 11 12 | xgMultiClassif(
gridNumber = 10,
levelNumber = 3,
recipe = rec,
folds = folds,
train = train_df,
test = test_df,
response = response,
treeNum = 100,
calcFeatImp = TRUE,
evalMetric = "roc_auc"
)
|
gridNumber |
Numeric. Size of the grid you want XGBoost to explore. Default is 10. |
levelNumber |
Numeric. How many levels are in your response? Default is 3. |
recipe |
A recipe object. |
folds |
A rsample::vfolds_cv object. |
train |
Data frame/tibble. The training data set. |
test |
Data frame/tibble. The testing data set. |
response |
Character. The variable that is the response for analysis. |
treeNum |
Numeric. The number of trees to evaluate your model with. |
calcFeatImp |
Logical. Do you want to calculate feature importance for your model? If not, set = FALSE. |
evalMetric |
Character. The classification metric you want to evaluate the model's accuracy on. Default is bal_accuracy. List of metrics available to choose from:
|
What the model tunes:
mtry: The number of predictors that will be randomly sampled at each split when creating the tree models.
min_n: The minimum number of data points in a node that are required for the node to be split further.
tree_depth: The maximum depth of the tree (i.e. number of splits).
learn_rate: The rate at which the boosting algorithm adapts from iteration-to-iteration.
loss_reduction: The reduction in the loss function required to split further.
sample_size: The amount of data exposed to the fitting routine.
What you set specifically:
trees: Default is 100. Sets the number of trees contained in the ensemble. A larger values increases runtime but (ideally) leads to more robust outcomes.
A list with the following outputs:
Training confusion matrix
Training model metric score
Testing confusion matrix
Testing model metric score
Final model chosen by XGBoost
Tuned model
Feature importance plot
Feature importance variable
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | library(easytidymodels)
library(dplyr)
library(recipes)
utils::data(penguins, package = "modeldata")
#Define your response variable and formula object here
resp <- "species"
formula <- stats::as.formula(paste(resp, ".", sep="~"))
#Split data into training and testing sets
split <- trainTestSplit(penguins, stratifyOnResponse = TRUE,
responseVar = resp)
#Create recipe for feature engineering for dataset, varies based on data working with
rec <- recipe(formula, data = split$train) %>% step_knnimpute(!!resp) %>%
step_dummy(all_nominal(), -all_outcomes()) %>%
step_medianimpute(all_predictors()) %>% step_normalize(all_predictors()) %>%
step_dummy(all_nominal(), -all_outcomes()) %>% step_nzv(all_predictors()) %>%
step_corr(all_numeric(), -all_outcomes(), threshold = .8) %>% prep()
train_df <- bake(rec, split$train)
test_df <- bake(rec, split$test)
folds <- cvFolds(train_df)
#xgClass <- xgMultiClassif(recipe = rec, response = resp, folds = folds,
#train = train_df, test = test_df, evalMetric = "roc_auc")
#Visualize training data and its predictions
#xgClass$trainConfMat
#View how model metrics look
#xgClass$trainScore
#Visualize testing data and its predictions
#xgClass$testConfMat
#View how model metrics look
#xgClass$testScore
#See the final model chosen by XGBoost based on optimizing for your chosen evaluation metric
#xgClass$final
#See how model fit looks based on another evaluation metric
#xgClass$tune %>% tune::show_best("bal_accuracy")
#Feature importance plot
#xgClass$featImpPlot
#Feature importance variables
#xgClass$featImpVars
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.