tidy_xgboost | R Documentation |
Accepts a formula to run an xgboost model. Automatically determines whether the formula is for classification or regression. Returns the xgboost model.
tidy_xgboost( .data, formula, ..., mtry = 1, trees = 15L, min_n = 1L, tree_depth = 6L, learn_rate = 0.3, loss_reduction = 0, sample_size = 1, stop_iter = 10L, counts = FALSE, tree_method = c("auto", "exact", "approx", "hist", "gpu_hist"), monotone_constraints = 0L, num_parallel_tree = 1L, lambda = 1, alpha = 0, scale_pos_weight = 1, verbosity = 0L, validate = TRUE )
.data |
dataframe |
formula |
formula |
... |
additional parameters to be passed to |
mtry |
# Randomly Selected Predictors (xgboost: colsample_bynode) (type: numeric, range 0 - 1) (or type: integer if |
trees |
# Trees (xgboost: nrounds) (type: integer, default: 15L) |
min_n |
Minimal Node Size (xgboost: min_child_weight) (type: integer, default: 1L); [typical range: 2-10] Keep small value for highly imbalanced class data where leaf nodes can have smaller size groups. Otherwise increase size to prevent overfitting outliers. |
tree_depth |
Tree Depth (xgboost: max_depth) (type: integer, default: 6L); Typical values: 3-10 |
learn_rate |
Learning Rate (xgboost: eta) (type: double, default: 0.3); Typical values: 0.01-0.3 |
loss_reduction |
Minimum Loss Reduction (xgboost: gamma) (type: double, default: 0.0); range: 0 to Inf; typical value: 0 - 20 assuming low-mid tree depth |
sample_size |
Proportion Observations Sampled (xgboost: subsample) (type: double, default: 1.0); Typical values: 0.5 - 1 |
stop_iter |
# Iterations Before Stopping (xgboost: early_stop) (type: integer, default: 15L) only enabled if validation set is provided |
counts |
if |
tree_method |
xgboost tree_method. default is |
monotone_constraints |
an integer vector with length of the predictor cols, of |
num_parallel_tree |
should be set to the size of the forest being trained. default 1L |
lambda |
[default=1] L2 regularization term on weights. Increasing this value will make model more conservative. |
alpha |
[default=0] L1 regularization term on weights. Increasing this value will make model more conservative. |
scale_pos_weight |
[default=1] Control the balance of positive and negative weights, useful for unbalanced classes. if set to TRUE, calculates sum(negative instances) / sum(positive instances). If first level is majority class, use values < 1, otherwise normally values >1 are used to balance the class distribution. |
verbosity |
[default=1] Verbosity of printing messages. Valid values are 0 (silent), 1 (warning), 2 (info), 3 (debug). |
validate |
default TRUE. report accuracy metrics on a validation set. |
In binary classification the target variable must be a factor with the first level set to the event of interest. A higher probability will predict the first level.
reference for parameters: xgboost docs
xgb.Booster model
options(rlang_trace_top_env = rlang::current_env()) # regression on numeric variable iris %>% framecleaner::create_dummies(Species) -> iris_dummy iris_dummy %>% tidy_formula(target= Petal.Length) -> petal_form iris_dummy %>% tidy_xgboost( petal_form, trees = 20, mtry = .5 ) -> xg1 xg1 %>% tidy_predict(newdata = iris_dummy, form = petal_form) -> iris_preds iris_preds %>% eval_preds()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.