report.xgb: Extreme Gradient Boosting HTML report
In Laurae2/Laurae: Advanced High Performance Data Science Toolbox for R

Description Usage Arguments Value Examples

This function creates an xgboost as a HTML file. Cross-validation is mandatory. Does NOT handle multiclass scenarios or non-regression/classification tasks. Does NOT handle gblinear. You cannot use process_type, updater, and refresh_leaf parameters. Add quiet = TRUE to the list of arguments to make the function "shut up" the massive verbose text.

report.xgb(data, label, folds, params, normalize = TRUE,
  classification = TRUE, threshold = 0.5, importance = TRUE,
  unbiased = TRUE, stats = TRUE, plots = TRUE, plot_type = "S",
  output_file = "report.xgb.html", output_dir = getwd(), open_file = TRUE,
  quiet = FALSE, ...)

`data`	Type: data.table. The data to fit a xgboost model on.
`label`	Type: vector. The label the data must fit to.
`folds`	Type: list of numeric vectors. The folds used.
`params`	Type: list. The parameters to pass to `report.xgb.helper`.
`normalize`	Type: boolean. Whether features should be normalized before being fed to the xgboost model. Defaults to `TRUE`.
`classification`	Type: boolean. Whether the task is a classification or not. Defaults to `TRUE`.
`threshold`	Type: numeric. The binary threshold to use for statistics when using `classification == TRUE`. Defaults to `0.5`.
`importance`	Type: boolean. Whether to perform feature importance computation or not. Defaults to `TRUE`.
`unbiased`	Type: boolean. Whether to perform unbiased feature importance computation or not. This doubles (sometimes triples) the effective training time, therefore this must be used with caution (for the benefits of getting very accurate and unbiased feature importance from the final cross-validated models). Defaults to `TRUE`.
`stats`	Type: boolean. Whether machine learning statistics should be output for model performance diagnosis. When `TRUE`, also returns the metrics and the out of fold predictions. Defaults to `TRUE`.
`plots`	Type: boolean. Whether plotting of fitted values vs predicted values should be done. Defaults to `TRUE`.
`plot_type`	Type: character. The type of plot to use for classification threshold calibration plots. `"p"` for points, `"l"` for lines, `"b"` for points+line, `"c"` for line without points, `"o"` for overplotted (points+line overlapping), `"h"` for high-density vertical lines (histogram-like), `"s"` for optimistic stair steps, `"S"` for pessimistic stair steps, `"n"` to plot nothing. Defaults to `"S"` for pessimistic stair step.
`output_file`	Type: character. The output report file name. Defaults to `"report.lm.html"`.
`output_dir`	Type: character. The output report directory name. Defaults to `getwd()`.
`open_file`	Type: boolean. Whether to open the output report once it has finished computing. Defaults to `TRUE`.
`quiet`	Type: boolean. Whether to "shut up" while rendering the HTML file or not. Defaults to `FALSE`.
`...`	Other arguments to pass to `rmarkdown::render`.

Returns a list with the machine learning metrics ("Metrics"), the machine learning probabilities ("Probs"), the folds "Folds", the fitted values per fold ("Fitted"), the predicted values per fold ("Predicted"), the biased feature importance ("BiasedImp"), and the unbiased feature importance ("UnbiasedImp") if they were computed. Otherwise, returns TRUE.

# No example.
## Not run: 
  library(Laurae)
  library(data.table)
  library(rmarkdown)
  library(xgboost)
  library(DT)
  library(formattable)
  library(matrixStats)
  library(lattice)
  library(R.utils)

## End(Not run)