auto_tune_xgboost | R Documentation |
Automatically tunes an xgboost model using grid or bayesian optimization
auto_tune_xgboost( .data, formula, tune_method = c("grid", "bayes"), event_level = c("first", "second"), n_fold = 5L, n_iter = 100L, seed = 1, save_output = FALSE, parallel = TRUE, trees = tune::tune(), min_n = tune::tune(), mtry = tune::tune(), tree_depth = tune::tune(), learn_rate = tune::tune(), loss_reduction = tune::tune(), sample_size = tune::tune(), stop_iter = tune::tune(), counts = FALSE, tree_method = c("auto", "exact", "approx", "hist", "gpu_hist"), monotone_constraints = 0L, num_parallel_tree = tune::tune(), lambda = 1, alpha = 0, scale_pos_weight = 1, verbosity = 0L )
.data |
dataframe |
formula |
formula |
tune_method |
method of tuning. defaults to grid |
event_level |
for binary classification, which factor level is the positive class. specify "second" for second level |
n_fold |
integer. n folds in resamples |
n_iter |
n iterations for tuning (bayes); paramter grid size (grid) |
seed |
seed |
save_output |
FASLE. If set to TRUE will write the output as an rds file |
parallel |
default TRUE; If set to TRUE, will enable parallel processing on resamples for grid tuning |
trees |
# Trees (xgboost: nrounds) (type: integer, default: 15L) |
min_n |
Minimal Node Size (xgboost: min_child_weight) (type: integer, default: 1L); [typical range: 2-10] Keep small value for highly imbalanced class data where leaf nodes can have smaller size groups. Otherwise increase size to prevent overfitting outliers. |
mtry |
# Randomly Selected Predictors (xgboost: colsample_bynode) (type: numeric, range 0 - 1) (or type: integer if |
tree_depth |
Tree Depth (xgboost: max_depth) (type: integer, default: 6L); Typical values: 3-10 |
learn_rate |
Learning Rate (xgboost: eta) (type: double, default: 0.3); Typical values: 0.01-0.3 |
loss_reduction |
Minimum Loss Reduction (xgboost: gamma) (type: double, default: 0.0); range: 0 to Inf; typical value: 0 - 20 assuming low-mid tree depth |
sample_size |
Proportion Observations Sampled (xgboost: subsample) (type: double, default: 1.0); Typical values: 0.5 - 1 |
stop_iter |
# Iterations Before Stopping (xgboost: early_stop) (type: integer, default: 15L) only enabled if validation set is provided |
counts |
if |
tree_method |
xgboost tree_method. default is |
monotone_constraints |
an integer vector with length of the predictor cols, of |
num_parallel_tree |
should be set to the size of the forest being trained. default 1L |
lambda |
[default=1] L2 regularization term on weights. Increasing this value will make model more conservative. |
alpha |
[default=0] L1 regularization term on weights. Increasing this value will make model more conservative. |
scale_pos_weight |
[default=1] Control the balance of positive and negative weights, useful for unbalanced classes. if set to TRUE, calculates sum(negative instances) / sum(positive instances). If first level is majority class, use values < 1, otherwise normally values >1 are used to balance the class distribution. |
verbosity |
[default=1] Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3 (debug). |
Default is to tune all 7 xgboost parameters. Individual parameter values can be optionally fixed to reduce tuning complexity.
workflow object
if(FALSE){ iris %>% framecleaner::create_dummies() -> iris1 iris1 %>% tidy_formula(target = Petal.Length) -> petal_form iris1 %>% rsample::initial_split() -> iris_split iris_split %>% rsample::analysis() -> iris_train iris_split %>% rsample::assessment() -> iris_val iris_train %>% auto_tune_xgboost(formula = petal_form, n_iter = 10, parallel = TRUE, method = "bayes") -> xgb_tuned xgb_tuned %>% fit(iris_train) %>% parsnip::extract_fit_engine() -> xgb_tuned_fit xgb_tuned_fit %>% tidy_predict(newdata = iris_val, form = petal_form) -> iris_val1 }
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.