s_XGBoost: XGBoost Classification and Regression (C, R)
In egenn/rtemis: Machine Learning and Visualization

s_XGBoost

R Documentation

XGBoost Classification and Regression (C, R)

Description

Tune hyperparameters using grid search and resampling, train a final model, and validate it

Usage

s_XGBoost(
  x,
  y = NULL,
  x.test = NULL,
  y.test = NULL,
  x.name = NULL,
  y.name = NULL,
  booster = c("gbtree", "gblinear", "dart"),
  missing = NA,
  nrounds = 1000L,
  force.nrounds = NULL,
  weights = NULL,
  ifw = TRUE,
  ifw.type = 2,
  upsample = FALSE,
  downsample = FALSE,
  resample.seed = NULL,
  obj = NULL,
  feval = NULL,
  xgb.verbose = NULL,
  print_every_n = 100L,
  early_stopping_rounds = 50L,
  eta = 0.01,
  gamma = 0,
  max_depth = 2,
  min_child_weight = 5,
  max_delta_step = 0,
  subsample = 0.75,
  colsample_bytree = 1,
  colsample_bylevel = 1,
  lambda = 0,
  alpha = 0,
  tree_method = "auto",
  sketch_eps = 0.03,
  num_parallel_tree = 1,
  base_score = NULL,
  objective = NULL,
  sample_type = "uniform",
  normalize_type = "forest",
  rate_drop = 0,
  one_drop = 0,
  skip_drop = 0,
  grid.resample.params = setup.resample("kfold", 5),
  gridsearch.type = "exhaustive",
  metric = NULL,
  maximize = NULL,
  importance = NULL,
  print.plot = FALSE,
  plot.fitted = NULL,
  plot.predicted = NULL,
  plot.theme = rtTheme,
  question = NULL,
  verbose = TRUE,
  grid.verbose = FALSE,
  trace = 0,
  save.gridrun = FALSE,
  n.cores = 1,
  nthread = rtCores,
  outdir = NULL,
  save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
  .gs = FALSE,
  ...
)

Arguments

`x`	Numeric vector or matrix / data frame of features i.e. independent variables
`y`	Numeric vector of outcome, i.e. dependent variable
`x.test`	Numeric vector or matrix / data frame of testing set features Columns must correspond to columns in `x`
`y.test`	Numeric vector of testing set outcome
`x.name`	Character: Name for feature set
`y.name`	Character: Name for outcome
`booster`	Character: "gbtree", "gblinear": Booster to use.
`missing`	String or Numeric: Which values to consider as missing.
`nrounds`	Integer: Maximum number of rounds to run. Can be set to a high number as early stopping will limit nrounds by monitoring inner CV error
`force.nrounds`	Integer: Number of rounds to run if not estimating optimal number by CV
`weights`	Numeric vector: Weights for cases. For classification, `weights` takes precedence over `ifw`, therefore set `weights = NULL` if using `ifw`. Note: If `weight` are provided, `ifw` is not used. Leave NULL if setting `ifw = TRUE`.
`ifw`	Logical: If TRUE, apply inverse frequency weighting (for Classification only). Note: If `weights` are provided, `ifw` is not used.
`ifw.type`	Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights)
`upsample`	Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness
`downsample`	Logical: If TRUE, downsample majority class to match size of minority class
`resample.seed`	Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed)
`obj`	Function: Custom objective function. See `?xgboost::xgboost`
`feval`	Function: Custom evaluation function. See `?xgboost::xgboost`
`xgb.verbose`	Integer: Verbose level for XGB learners used for tuning.
`print_every_n`	Integer: Print evaluation metrics every this many iterations
`early_stopping_rounds`	Integer: Training on resamples of `x.train` (tuning) will stop if performance does not improve for this many rounds
`eta`	[gS] Numeric (0, 1): Learning rate.
`gamma`	[gS] Numeric: Minimum loss reduction required to make further partition
`max_depth`	[gS] Integer: Maximum tree depth.
`min_child_weight`	[gS] Numeric: Minimum sum of instance weight needed in a child.
`max_delta_step`	[gS] Numeric: Maximum delta step we allow each leaf output to be. O means no constraint. 1-10 may help control the update, especially with imbalanced outcomes.
`subsample`	[gS] Numeric: subsample ratio of the training instance
`colsample_bytree`	[gS] Numeric: subsample ratio of columns when constructing each tree
`colsample_bylevel`	[gS] Numeric
`lambda`	[gS] L2 regularization on weights
`alpha`	[gS] L1 regularization on weights
`tree_method`	[gS] XGBoost tree construction algorithm
`sketch_eps`	[gS] Numeric (0, 1):
`num_parallel_tree`	Integer: N of trees to grow in parallel: Results in Random Forest -like algorithm. (Default = 1; i.e. regular boosting)
`base_score`	Numeric: The mean outcome response.
`objective`	(Default = NULL)
`sample_type`	Character: Type of sampling algorithm for `dart` booster "uniform": dropped trees are selected uniformly. "weighted": dropped trees are selected in proportion to weight.
`normalize_type`	Character.
`rate_drop`	[gS] Numeric: Dropout rate for `dart` booster.
`one_drop`	[gS] Integer 0, 1: When this flag is enabled, at least one tree is always dropped during the dropout.
`skip_drop`	[gS] Numeric [0, 1]: Probability of skipping the dropout procedure during a boosting iteration. If a dropout is skipped, new trees are added in the same manner as gbtree. Non-zero `skip_drop` has higher priority than `rate_drop` or `one_drop`.
`grid.resample.params`	List: Output of setup.resample defining grid search parameters.
`gridsearch.type`	Character: Type of grid search to perform: "exhaustive" or "randomized".
`metric`	Character: Metric to minimize, or maximize if `maximize = TRUE` during grid search. Default = NULL, which results in "Balanced Accuracy" for Classification, "MSE" for Regression, and "Coherence" for Survival Analysis.
`maximize`	Logical: If TRUE, `metric` will be maximized if grid search is run.
`importance`	Logical: If TRUE, calculate variable importance.
`print.plot`	Logical: if TRUE, produce plot using `mplot3` Takes precedence over `plot.fitted` and `plot.predicted`.
`plot.fitted`	Logical: if TRUE, plot True (y) vs Fitted
`plot.predicted`	Logical: if TRUE, plot True (y.test) vs Predicted. Requires `x.test` and `y.test`
`plot.theme`	Character: "zero", "dark", "box", "darkbox"
`question`	Character: the question you are attempting to answer with this model, in plain language.
`verbose`	Logical: If TRUE, print summary to screen.
`grid.verbose`	Logical: Passed to `gridSearchLearn`
`trace`	Integer: If > 0, print parameter values to console.
`save.gridrun`	Logical: If TRUE, save grid search models.
`n.cores`	Integer: Number of cores to use.
`nthread`	Integer: Number of threads for xgboost using OpenMP. Only parallelize resamples using `n.cores` or the xgboost execution using this setting. At the moment of writing, parallelization via this parameter causes a linear booster to fail most of the times. Therefore, default is rtCores for 'gbtree', 1 for 'gblinear'
`outdir`	Path to output directory. If defined, will save Predicted vs. True plot, if available, as well as full model output, if `save.mod` is TRUE
`save.mod`	Logical: If TRUE, save all output to an RDS file in `outdir` `save.mod` is TRUE by default if an `outdir` is defined. If set to TRUE, and no `outdir` is defined, outdir defaults to `paste0("./s.", mod.name)`
`.gs`	Internal use only
`...`	Additional arguments passed to `xgboost::xgb.train`

Details

[gS]: indicates parameter will be autotuned by grid search if multiple values are passed. Learn more about XGBoost's parameters here: http://xgboost.readthedocs.io/en/latest/parameter.html

Value

rtMod object

Author(s)

E.D. Gennatas

egenn/rtemis
Machine Learning and Visualization

s_XGBoost: XGBoost Classification and Regression (C, R)
In egenn/rtemis: Machine Learning and Visualization

XGBoost Classification and Regression (C, R)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to s_XGBoost in egenn/rtemis...

R Package Documentation

Browse R Packages

We want your feedback!

egenn/rtemis Machine Learning and Visualization

s_XGBoost: XGBoost Classification and Regression (C, R) In egenn/rtemis: Machine Learning and Visualization

XGBoost Classification and Regression (C, R)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Related to s_XGBoost in egenn/rtemis...

R Package Documentation

Browse R Packages

We want your feedback!

egenn/rtemis
Machine Learning and Visualization

s_XGBoost: XGBoost Classification and Regression (C, R)
In egenn/rtemis: Machine Learning and Visualization