Description Usage Arguments Details Value Examples
View source: R/Laurae.xgb.train.R
Trains an xgboost model. Requires Matrix and xgboost packages.
1 2 3 4 5 6 7 8 9 10 | Laurae.xgb.train(train, watchlist = NULL, clean_mem = FALSE, seed = 1,
verbose = 1, verbose_iterations = 1, objective = "reg:linear",
metric = "rmse", maximize = NULL, boost_method = "gbtree",
boost_tree = "hist", boost_grow = "depthwise", boost_bin = 255,
boost_memory = "uint32", boost_weighting = 1, learn_threads = 1,
learn_shrink = 0.05, iteration_max = 100, iteration_trees = 1,
iteration_stop = 20, tree_depth = 6, tree_leaves = 0, sample_row = 1,
sample_col = 1, reg_l1 = 0, reg_l2 = 0, reg_l2_bias = 0,
reg_loss = 0, reg_hessian = 1, dart_rate_drop = 0, dart_skip_drop = 0,
dart_sampling = "uniform", dart_norm = "tree", dart_min_1 = 0, ...)
|
train |
Type: xgb.DMatrix. The training data. |
watchlist |
Type: list of xgb.DMatrix. The data to monitor through the metrics, defaults to |
clean_mem |
Type: logical. Whether the force garbage collection before and after training in order to reclaim RAM. Defaults to |
seed |
Type: numeric. Seed for the random number generator for reproducibility, defaults to |
verbose |
Type: numeric. Whether to print messages. Defaults to |
verbose_iterations |
Type: numeric. How many iterations to cool down before printing on the console again. Defaults to |
objective |
Type: character or function. The objective to optimize, defaults to
|
metric |
Type: character or function. The metric to print against the
|
maximize |
Type: logical. Whether to maximize the metric, defaults to |
boost_method |
Type: character. Boosting method, defauts to
|
boost_tree |
Type: character. Tree method, defauts to
|
boost_grow |
Type: character. Growing method, defauts to
|
boost_bin |
Type: numeric. Maximum number of unique values per feature, defauts to
|
boost_memory |
Type: character. Memory used for binning, defauts to
|
boost_weighting |
Type: numeric. Weighting of positive labels, defauts to
|
learn_threads |
Type: numeric. Number of threads, defauts to
|
learn_shrink |
Type: numeric. Learning rate, defauts to
|
iteration_max |
Type: numeric. Number of boosting iterations, defauls to
|
iteration_trees |
Type: numeric. Averaged trees per iteration, defauls to
|
iteration_stop |
Type: numeric. Number of iterations without improvement before stopping, defauls to
|
tree_depth |
Type: numeric. Maximum tree depth, defauls to
|
tree_leaves |
Type: numeric. Maximum tree leaves, defauls to
|
sample_row |
Type: numeric. Row sampling, defauls to
|
sample_col |
Type: numeric. Column sampling per tree, defauls to
|
reg_l1 |
Type: numeric. L1 regularization, defauls to
|
reg_l2 |
Type: numeric. L2 regularization, defauls to
|
reg_l2_bias |
Type: numeric. L2 Bias regularization (not for GBDT models), defauls to
|
reg_loss |
Type: numeric. Minimum Loss per Split, defauls to
|
reg_hessian |
Type: numeric. Minimum Hessian per Split, defauls to
|
dart_rate_drop |
Type: numeric. DART booster tree drop rate, defauls to
|
dart_skip_drop |
Type: numeric. DART booster tree skip rate, defauls to
|
dart_sampling |
Type: character. DART booster sampling distribution, defauls to
|
dart_norm |
Type: character. DART booster weight normalization, defauls to
|
dart_min_1 |
Type: numeric. DART booster drop at least one tree, defauls to
|
... |
Other parameters to pass to xgboost's |
The following parameters were removed the following reasons:
debug_verbosewas a parameter added to debug Laurae's code for several xgboost GitHub issues.
colsample_bylevelis significantly weaker than colsample_bytree.
sparse_thresholdis a mysterious "hist" parameter.
max_conflict_rateis a "hist" specific feature bundling parameter.
max_search_groupis a "hist" specific feature bundling parameter.
base_marginis an unusual hyperparameter which should be used for guaranteeing faster convergence.
num_classis a parameter which must be added by yourself for multiclass problems.
enable_feature_groupingis not available in every xgboost version.
sketch_epsbecause "approx" method is obsolete since "hist" exists.
max_delta_stepshould be defined by yourself only when you need it (especially for Poisson regression which has exploding gradients).
tweedie_variance_powershould be defined by yourself when you are optimizing Tweedie distribution objectives.
updaterbecause we don't expect you to modify the sequence of tree updates, as xgboost automatically defines it.
refresh_leafbecause we are not only updating node statistics.
process_typebecause we let xgboost do its job.
???because I might have missed some other important parameters.
You may add them without any issues unlike other parameters.
The xgboost model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | library(Matrix)
library(xgboost)
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(train = dtrain, eval = dtest)
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
return(list(metric = "error", value = err))
}
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = "binary:logistic",
metric = "auc",
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = logregobj,
metric = "auc",
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = "binary:logistic",
metric = evalerror,
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5,
maximize = FALSE)
# CAN'T DO THIS, IGNORE ANY NOT 1st METRIC
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = logregobj,
metric = c("rmse", "auc"),
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = logregobj,
metric = evalerror,
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5,
maximize = FALSE)
# CAN'T DO THIS
# model <- Laurae.xgb.train(train = dtrain,
# watchlist = watchlist,
# verbose = 1,
# objective = logregobj,
# metric = c(evalerror, "auc"),
# tree_depth = 2,
# learn_shrink = 1,
# learn_threads = 1,
# iteration_max = 5,
# maximize = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.