Description Usage Arguments Value Examples
This function allows you to optimize the depth of xgboost in gbtree/dart booster given the other parameters constant.
Output is intentionally pushed to the global environment, specifically in Laurae.xgb.opt.depth.df
, Laurae.xgb.opt.depth.iter
, and Laurae.xgb.opt.depth.best
to allow manual interruption without losing data.
Verbosity is automatic and cannot be removed. In case you need this function without verbosity, please compile the package after removing verbose messages.
In addition, a sink is forced. Make sure to run sink()
if you interrupt (or if xgboost interrupts) prematurely the execution of the function. Otherwise, you end up with no more messages printed to your R console.
initial = 8, min_depth = 1, max_depth = 25, patience = 2, sd_effect = 0.001, worst_score = 0, learner = NA, better = max_better
1 2 | xgb.opt.depth(initial = 8, min_depth = 1, max_depth = 25, patience = 2,
sd_effect = 0.001, worst_score = 0, learner = NA, better = max_better)
|
initial |
The initial starting search depth. This is the starting point, along with |
min_depth |
The minimum accepted depth. If it is reached, the computation stops. Defaults to |
max_depth |
The maximum accepted depth. If it is reached, the computation stops. Defaults to |
patience |
How many iterations are allowed without improvement, excluding the initialization (the three first computations). Larger means more patience before stopping due to no improvement of the scored metric. Defaults to |
sd_effect |
How much the standard deviation accounts in the score to determine the best depth parameter. Default to |
worst_score |
The worst possible score of the metric used, as a numeric (non NA / Infinite) value. Defaults to |
learner |
The learner function. It fetches everything needed from the global environment. Defaults to |
better |
Should we optimize for the minimum or the maximum value of the performance? Defaults to |
Three elements forced in the global environment: "Laurae.xgb.opt.depth.df"
for the dataframe with depth log (data.frame), "Laurae.xgb.opt.depth.iter"
for the dataframe with iteration log (list), and "Laurae.xgb.opt.depth.best"
for a length 1 vector with the best depth found (numeric).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | #Please check xgb.opt.utils.R file in GitHub.
## Not run:
max_better <- function(cp) {
return(max(cp, na.rm = TRUE))
}
my_learner <- function(depth) {
sink(file = "Laurae/log.txt", append = TRUE, split = FALSE)
cat("\n\n\nDepth ", depth, "\n\n", sep = "")
global_depth <<- depth
gc()
set.seed(11111)
temp_model <- xgb.cv(data = dtrain,
nthread = 12,
folds = folded,
nrounds = 100000,
max_depth = depth,
eta = 0.05,
#gamma = 0.1,
subsample = 1.0,
colsample_bytree = 1.0,
booster = "gbtree",
#eval_metric = "auc",
eval_metric = mcc_eval_nofail_cv,
maximize = TRUE,
early_stopping_rounds = 25,
objective = "binary:logistic",
verbose = TRUE
#base_score = 0.005811208
)
sink()
i <<- 0
return(c(temp_model$evaluation_log[[4]][temp_model$best_iteration],
temp_model$evaluation_log[[5]][temp_model$best_iteration], temp_model$best_iteration))
}
xgb.opt.depth.callback <- function(i, learner, better, sd_effect) {
cat("\nExploring depth ", sprintf("%02d", Laurae.xgb.opt.depth.iter[i, "Depth"]), ": ")
Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"],
c("mean", "sd", "nrounds")] <<- learner(Laurae.xgb.opt.depth.iter[i, "Depth"])
Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"],
"score"] <<- Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"] +
(Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "sd"] * sd_effect)
Laurae.xgb.opt.depth.iter[i,
"Score"] <<- Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "score"]
Laurae.xgb.opt.depth.iter[i,
"Best"] <<- better(Laurae.xgb.opt.depth.df[, "score"])
Laurae.xgb.opt.depth.best <<- which(
Laurae.xgb.opt.depth.df[, "score"] == Laurae.xgb.opt.depth.iter[i, "Best"])[1]
cat("[",
sprintf("%05d", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "nrounds"]),
"] ",
sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"]),
ifelse(is.na(Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"]) == TRUE,
"",
paste("+",
sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "sd"]),
sep = "")),
" (Score: ",
sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "score"]),
ifelse(Laurae.xgb.opt.depth.iter[i, "Best"] == Laurae.xgb.opt.depth.iter[i, "Score"],
" <<<)",
" )"),
" - best is: ",
Laurae.xgb.opt.depth.best,
" - ",
format(Sys.time(), "%a %b %d %Y %X"),
sep = "")
}
xgb.opt.depth(initial = 10, min_depth = 1, max_depth = 20, patience = 2, sd_effect = 0,
worst_score = 0, learner = my_learner, better = max_better)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.