Description Usage Arguments Details Value Examples
View source: R/Laurae.xgb.train.R
Trains an xgboost model. Requires Matrix and xgboost packages.
1 2 3 4 5 6 7 8 9 10  | Laurae.xgb.train(train, watchlist = NULL, clean_mem = FALSE, seed = 1,
  verbose = 1, verbose_iterations = 1, objective = "reg:linear",
  metric = "rmse", maximize = NULL, boost_method = "gbtree",
  boost_tree = "hist", boost_grow = "depthwise", boost_bin = 255,
  boost_memory = "uint32", boost_weighting = 1, learn_threads = 1,
  learn_shrink = 0.05, iteration_max = 100, iteration_trees = 1,
  iteration_stop = 20, tree_depth = 6, tree_leaves = 0, sample_row = 1,
  sample_col = 1, reg_l1 = 0, reg_l2 = 0, reg_l2_bias = 0,
  reg_loss = 0, reg_hessian = 1, dart_rate_drop = 0, dart_skip_drop = 0,
  dart_sampling = "uniform", dart_norm = "tree", dart_min_1 = 0, ...)
 | 
train | 
 Type: xgb.DMatrix. The training data.  | 
watchlist | 
 Type: list of xgb.DMatrix. The data to monitor through the metrics, defaults to   | 
clean_mem | 
 Type: logical. Whether the force garbage collection before and after training in order to reclaim RAM. Defaults to   | 
seed | 
 Type: numeric. Seed for the random number generator for reproducibility, defaults to   | 
verbose | 
 Type: numeric. Whether to print messages. Defaults to   | 
verbose_iterations | 
 Type: numeric. How many iterations to cool down before printing on the console again. Defaults to   | 
objective | 
 Type: character or function. The objective to optimize, defaults to  
  | 
metric | 
 Type: character or function. The metric to print against the  
  | 
maximize | 
 Type: logical. Whether to maximize the metric, defaults to   | 
boost_method | 
 Type: character. Boosting method, defauts to  
  | 
boost_tree | 
 Type: character. Tree method, defauts to  
  | 
boost_grow | 
 Type: character. Growing method, defauts to  
  | 
boost_bin | 
 Type: numeric. Maximum number of unique values per feature, defauts to  
  | 
boost_memory | 
 Type: character. Memory used for binning, defauts to  
  | 
boost_weighting | 
 Type: numeric. Weighting of positive labels, defauts to  
  | 
learn_threads | 
 Type: numeric. Number of threads, defauts to  
  | 
learn_shrink | 
 Type: numeric. Learning rate, defauts to  
  | 
iteration_max | 
 Type: numeric. Number of boosting iterations, defauls to  
  | 
iteration_trees | 
 Type: numeric. Averaged trees per iteration, defauls to  
  | 
iteration_stop | 
 Type: numeric. Number of iterations without improvement before stopping, defauls to  
  | 
tree_depth | 
 Type: numeric. Maximum tree depth, defauls to  
  | 
tree_leaves | 
 Type: numeric. Maximum tree leaves, defauls to  
  | 
sample_row | 
 Type: numeric. Row sampling, defauls to  
  | 
sample_col | 
 Type: numeric. Column sampling per tree, defauls to  
  | 
reg_l1 | 
 Type: numeric. L1 regularization, defauls to  
  | 
reg_l2 | 
 Type: numeric. L2 regularization, defauls to  
  | 
reg_l2_bias | 
 Type: numeric. L2 Bias regularization (not for GBDT models), defauls to  
  | 
reg_loss | 
 Type: numeric. Minimum Loss per Split, defauls to  
  | 
reg_hessian | 
 Type: numeric. Minimum Hessian per Split, defauls to  
  | 
dart_rate_drop | 
 Type: numeric. DART booster tree drop rate, defauls to  
  | 
dart_skip_drop | 
 Type: numeric. DART booster tree skip rate, defauls to  
  | 
dart_sampling | 
 Type: character. DART booster sampling distribution, defauls to  
  | 
dart_norm | 
 Type: character. DART booster weight normalization, defauls to  
  | 
dart_min_1 | 
 Type: numeric. DART booster drop at least one tree, defauls to  
  | 
... | 
 Other parameters to pass to xgboost's   | 
The following parameters were removed the following reasons:
debug_verbosewas a parameter added to debug Laurae's code for several xgboost GitHub issues.
colsample_bylevelis significantly weaker than colsample_bytree.
sparse_thresholdis a mysterious "hist" parameter.
max_conflict_rateis a "hist" specific feature bundling parameter.
max_search_groupis a "hist" specific feature bundling parameter.
base_marginis an unusual hyperparameter which should be used for guaranteeing faster convergence.
num_classis a parameter which must be added by yourself for multiclass problems.
enable_feature_groupingis not available in every xgboost version.
sketch_epsbecause "approx" method is obsolete since "hist" exists.
max_delta_stepshould be defined by yourself only when you need it (especially for Poisson regression which has exploding gradients).
tweedie_variance_powershould be defined by yourself when you are optimizing Tweedie distribution objectives.
updaterbecause we don't expect you to modify the sequence of tree updates, as xgboost automatically defines it.
refresh_leafbecause we are not only updating node statistics.
process_typebecause we let xgboost do its job.
???because I might have missed some other important parameters.
You may add them without any issues unlike other parameters.
The xgboost model.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87  | library(Matrix)
library(xgboost)
data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")
dtrain <- xgb.DMatrix(agaricus.train$data, label = agaricus.train$label)
dtest <- xgb.DMatrix(agaricus.test$data, label = agaricus.test$label)
watchlist <- list(train = dtrain, eval = dtest)
logregobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  preds <- 1/(1 + exp(-preds))
  grad <- preds - labels
  hess <- preds * (1 - preds)
  return(list(grad = grad, hess = hess))
}
evalerror <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  err <- as.numeric(sum(labels != (preds > 0)))/length(labels)
  return(list(metric = "error", value = err))
}
model <- Laurae.xgb.train(train = dtrain,
                          watchlist = watchlist,
                          verbose = 1,
                          objective = "binary:logistic",
                          metric = "auc",
                          tree_depth = 2,
                          learn_shrink = 1,
                          learn_threads = 1,
                          iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
                          watchlist = watchlist,
                          verbose = 1,
                          objective = logregobj,
                          metric = "auc",
                          tree_depth = 2,
                          learn_shrink = 1,
                          learn_threads = 1,
                          iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
                          watchlist = watchlist,
                          verbose = 1,
                          objective = "binary:logistic",
                          metric = evalerror,
                          tree_depth = 2,
                          learn_shrink = 1,
                          learn_threads = 1,
                          iteration_max = 5,
                          maximize = FALSE)
# CAN'T DO THIS, IGNORE ANY NOT 1st METRIC
model <- Laurae.xgb.train(train = dtrain,
                          watchlist = watchlist,
                          verbose = 1,
                          objective = logregobj,
                          metric = c("rmse", "auc"),
                          tree_depth = 2,
                          learn_shrink = 1,
                          learn_threads = 1,
                          iteration_max = 5)
model <- Laurae.xgb.train(train = dtrain,
                          watchlist = watchlist,
                          verbose = 1,
                          objective = logregobj,
                          metric = evalerror,
                          tree_depth = 2,
                          learn_shrink = 1,
                          learn_threads = 1,
                          iteration_max = 5,
                          maximize = FALSE)
# CAN'T DO THIS
# model <- Laurae.xgb.train(train = dtrain,
#                           watchlist = watchlist,
#                           verbose = 1,
#                           objective = logregobj,
#                           metric = c(evalerror, "auc"),
#                           tree_depth = 2,
#                           learn_shrink = 1,
#                           learn_threads = 1,
#                           iteration_max = 5,
#                           maximize = FALSE)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.