xgb.opt.depth: xgboost depth automated optimizer
In Laurae2/Laurae: Advanced High Performance Data Science Toolbox for R

Description Usage Arguments Value Examples

This function allows you to optimize the depth of xgboost in gbtree/dart booster given the other parameters constant. Output is intentionally pushed to the global environment, specifically in Laurae.xgb.opt.depth.df, Laurae.xgb.opt.depth.iter, and Laurae.xgb.opt.depth.best to allow manual interruption without losing data. Verbosity is automatic and cannot be removed. In case you need this function without verbosity, please compile the package after removing verbose messages. In addition, a sink is forced. Make sure to run sink() if you interrupt (or if xgboost interrupts) prematurely the execution of the function. Otherwise, you end up with no more messages printed to your R console. initial = 8, min_depth = 1, max_depth = 25, patience = 2, sd_effect = 0.001, worst_score = 0, learner = NA, better = max_better

1 2	xgb.opt.depth(initial = 8, min_depth = 1, max_depth = 25, patience = 2, sd_effect = 0.001, worst_score = 0, learner = NA, better = max_better)

`initial`	The initial starting search depth. This is the starting point, along with `initial - 2` and `initial + 2` depths. Defaults to `8`.
`min_depth`	The minimum accepted depth. If it is reached, the computation stops. Defaults to `1`.
`max_depth`	The maximum accepted depth. If it is reached, the computation stops. Defaults to `25`.
`patience`	How many iterations are allowed without improvement, excluding the initialization (the three first computations). Larger means more patience before stopping due to no improvement of the scored metric. Defaults to `2`.
`sd_effect`	How much the standard deviation accounts in the score to determine the best depth parameter. Default to `0.001`.
`worst_score`	The worst possible score of the metric used, as a numeric (non NA / Infinite) value. Defaults to `0`.
`learner`	The learner function. It fetches everything needed from the global environment. Defaults to `my_learner`, which is an example of using that function.
`better`	Should we optimize for the minimum or the maximum value of the performance? Defaults to `max_better` for maximization of the scored metric. Use `min_better` for the minimization of the scored metric.

Three elements forced in the global environment: "Laurae.xgb.opt.depth.df" for the dataframe with depth log (data.frame), "Laurae.xgb.opt.depth.iter" for the dataframe with iteration log (list), and "Laurae.xgb.opt.depth.best" for a length 1 vector with the best depth found (numeric).

#Please check xgb.opt.utils.R file in GitHub.
## Not run: 


max_better <- function(cp) {
  return(max(cp, na.rm = TRUE))
}

my_learner <- function(depth) {
  sink(file = "Laurae/log.txt", append = TRUE, split = FALSE)
  cat("\n\n\nDepth ", depth, "\n\n", sep = "")
  global_depth <<- depth
  gc()
  set.seed(11111)
  temp_model <- xgb.cv(data = dtrain,
                       nthread = 12,
                       folds = folded,
                       nrounds = 100000,
                       max_depth = depth,
                       eta = 0.05,
                       #gamma = 0.1,
                       subsample = 1.0,
                       colsample_bytree = 1.0,
                       booster = "gbtree",
                       #eval_metric = "auc",
                       eval_metric = mcc_eval_nofail_cv,
                       maximize = TRUE,
                       early_stopping_rounds = 25,
                       objective = "binary:logistic",
                       verbose = TRUE
                       #base_score = 0.005811208
  )
  sink()
  i <<- 0
  return(c(temp_model$evaluation_log[[4]][temp_model$best_iteration],
  temp_model$evaluation_log[[5]][temp_model$best_iteration], temp_model$best_iteration))
}

xgb.opt.depth.callback <- function(i, learner, better, sd_effect) {
  cat("\nExploring depth ", sprintf("%02d", Laurae.xgb.opt.depth.iter[i, "Depth"]), ": ")
  Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"],
  c("mean", "sd", "nrounds")] <<- learner(Laurae.xgb.opt.depth.iter[i, "Depth"])
  Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"],
  "score"] <<- Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"] +
  (Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "sd"] * sd_effect)
  Laurae.xgb.opt.depth.iter[i,
  "Score"] <<- Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "score"]
  Laurae.xgb.opt.depth.iter[i,
  "Best"] <<- better(Laurae.xgb.opt.depth.df[, "score"])
  Laurae.xgb.opt.depth.best <<- which(
  Laurae.xgb.opt.depth.df[, "score"] == Laurae.xgb.opt.depth.iter[i, "Best"])[1]
  cat("[",
      sprintf("%05d", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "nrounds"]),
      "] ",
      sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"]),
      ifelse(is.na(Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "mean"]) == TRUE,
      "",
      paste("+",
      sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "sd"]),
      sep = "")),
      " (Score: ",
      sprintf("%.08f", Laurae.xgb.opt.depth.df[Laurae.xgb.opt.depth.iter[i, "Depth"], "score"]),
      ifelse(Laurae.xgb.opt.depth.iter[i, "Best"] == Laurae.xgb.opt.depth.iter[i, "Score"],
      " <<<)",
      "    )"),
      " - best is: ",
      Laurae.xgb.opt.depth.best,
      " - ",
      format(Sys.time(), "%a %b %d %Y %X"),
      sep = "")
}

xgb.opt.depth(initial = 10, min_depth = 1, max_depth = 20, patience = 2, sd_effect = 0,
worst_score = 0, learner = my_learner, better = max_better)


## End(Not run)