Description Usage Arguments Details Value Examples
View source: R/Laurae.xgb.train.R
Trains an xgboost model. Requires Matrix
and xgboost
packages.
1 2 3 4 5 6 7 8 9 10 | Laurae.xgb.train(train, watchlist = NULL, clean_mem = FALSE, seed = 1,
verbose = 1, verbose_iterations = 1, objective = "reg:linear",
metric = "rmse", maximize = NULL, boost_method = "gbtree",
boost_tree = "hist", boost_grow = "depthwise", boost_bin = 255,
boost_memory = "uint32", boost_weighting = 1, learn_threads = 1,
learn_shrink = 0.05, iteration_max = 100, iteration_trees = 1,
iteration_stop = 20, tree_depth = 6, tree_leaves = 0, sample_row = 1,
sample_col = 1, reg_l1 = 0, reg_l2 = 0, reg_l2_bias = 0,
reg_loss = 0, reg_hessian = 1, dart_rate_drop = 0, dart_skip_drop = 0,
dart_sampling = "uniform", dart_norm = "tree", dart_min_1 = 0, ...)
|
train |
Type: xgb.DMatrix. The training data. |
watchlist |
Type: list of xgb.DMatrix. The data to monitor through the metrics, defaults to |
clean_mem |
Type: logical. Whether the force garbage collection before and after training in order to reclaim RAM. Defaults to |
seed |
Type: numeric. Seed for the random number generator for reproducibility, defaults to |
verbose |
Type: numeric. Whether to print messages. Defaults to |
verbose_iterations |
Type: numeric. How many iterations to cool down before printing on the console again. Defaults to |
objective |
Type: character or function. The objective to optimize, defaults to
|
metric |
Type: character or function. The metric to print against the
|
maximize |
Type: logical. Whether to maximize the metric, defaults to |
boost_method |
Type: character. Boosting method, defauts to
|
boost_tree |
Type: character. Tree method, defauts to
|
boost_grow |
Type: character. Growing method, defauts to
|
boost_bin |
Type: numeric. Maximum number of unique values per feature, defauts to
|
boost_memory |
Type: character. Memory used for binning, defauts to
|
boost_weighting |
Type: numeric. Weighting of positive labels, defauts to
|
learn_threads |
Type: numeric. Number of threads, defauts to
|
learn_shrink |
Type: numeric. Learning rate, defauts to
|
iteration_max |
Type: numeric. Number of boosting iterations, defauls to
|
iteration_trees |
Type: numeric. Averaged trees per iteration, defauls to
|
iteration_stop |
Type: numeric. Number of iterations without improvement before stopping, defauls to
|
tree_depth |
Type: numeric. Maximum tree depth, defauls to
|
tree_leaves |
Type: numeric. Maximum tree leaves, defauls to
|
sample_row |
Type: numeric. Row sampling, defauls to
|
sample_col |
Type: numeric. Column sampling per tree, defauls to
|
reg_l1 |
Type: numeric. L1 regularization, defauls to
|
reg_l2 |
Type: numeric. L2 regularization, defauls to
|
reg_l2_bias |
Type: numeric. L2 Bias regularization (not for GBDT models), defauls to
|
reg_loss |
Type: numeric. Minimum Loss per Split, defauls to
|
reg_hessian |
Type: numeric. Minimum Hessian per Split, defauls to
|
dart_rate_drop |
Type: numeric. DART booster tree drop rate, defauls to
|
dart_skip_drop |
Type: numeric. DART booster tree skip rate, defauls to
|
dart_sampling |
Type: character. DART booster sampling distribution, defauls to
|
dart_norm |
Type: character. DART booster weight normalization, defauls to
|
dart_min_1 |
Type: numeric. DART booster drop at least one tree, defauls to
|
... |
Other parameters to pass to xgboost's |
The following parameters were removed the following reasons:
debug_verbose
was a parameter added to debug Laurae's code for several xgboost GitHub issues.
colsample_bylevel
is significantly weaker than colsample_bytree
.
sparse_threshold
is a mysterious "hist"
parameter.
max_conflict_rate
is a "hist"
specific feature bundling parameter.
max_search_group
is a "hist"
specific feature bundling parameter.
base_margin
is an unusual hyperparameter which should be used for guaranteeing faster convergence.
num_class
is a parameter which must be added by yourself for multiclass problems.
enable_feature_grouping
is not available in every xgboost version.
sketch_eps
because "approx"
method is obsolete since "hist"
exists.
max_delta_step
should be defined by yourself only when you need it (especially for Poisson regression which has exploding gradients).
tweedie_variance_power
should be defined by yourself when you are optimizing Tweedie distribution objectives.
updater
because we don't expect you to modify the sequence of tree updates, as xgboost automatically defines it.
refresh_leaf
because we are not only updating node statistics.
process_type
because we let xgboost do its job.
???
because I might have missed some other important parameters.
You may add them without any issues unlike other parameters.
The xgboost model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | library(Matrix)
library(xgboost)
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(train = dtrain, eval = dtest)
logregobj <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
preds <- 1/(1 + exp(-preds))
grad <- preds - labels
hess <- preds * (1 - preds)
return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
return(list(metric = "error", value = err))
}
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = "binary:logistic",
metric = "auc",
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = logregobj,
metric = "auc",
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = "binary:logistic",
metric = evalerror,
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5,
maximize = FALSE)
# CAN'T DO THIS, IGNORE ANY NOT 1st METRIC
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = logregobj,
metric = c("rmse", "auc"),
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
watchlist = watchlist,
verbose = 1,
objective = logregobj,
metric = evalerror,
tree_depth = 2,
learn_shrink = 1,
learn_threads = 1,
iteration_max = 5,
maximize = FALSE)
# CAN'T DO THIS
# model <- Laurae.xgb.train(train = dtrain,
# watchlist = watchlist,
# verbose = 1,
# objective = logregobj,
# metric = c(evalerror, "auc"),
# tree_depth = 2,
# learn_shrink = 1,
# learn_threads = 1,
# iteration_max = 5,
# maximize = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.