s_XGBoost | R Documentation |
Tune hyperparameters using grid search and resampling, train a final model, and validate it
s_XGBoost(
x,
y = NULL,
x.test = NULL,
y.test = NULL,
x.name = NULL,
y.name = NULL,
booster = c("gbtree", "gblinear", "dart"),
missing = NA,
nrounds = 1000L,
force.nrounds = NULL,
weights = NULL,
ifw = TRUE,
ifw.type = 2,
upsample = FALSE,
downsample = FALSE,
resample.seed = NULL,
obj = NULL,
feval = NULL,
xgb.verbose = NULL,
print_every_n = 100L,
early_stopping_rounds = 50L,
eta = 0.01,
gamma = 0,
max_depth = 2,
min_child_weight = 5,
max_delta_step = 0,
subsample = 0.75,
colsample_bytree = 1,
colsample_bylevel = 1,
lambda = 0,
alpha = 0,
tree_method = "auto",
sketch_eps = 0.03,
num_parallel_tree = 1,
base_score = NULL,
objective = NULL,
sample_type = "uniform",
normalize_type = "forest",
rate_drop = 0,
one_drop = 0,
skip_drop = 0,
grid.resample.params = setup.resample("kfold", 5),
gridsearch.type = "exhaustive",
metric = NULL,
maximize = NULL,
importance = NULL,
print.plot = FALSE,
plot.fitted = NULL,
plot.predicted = NULL,
plot.theme = rtTheme,
question = NULL,
verbose = TRUE,
grid.verbose = FALSE,
trace = 0,
save.gridrun = FALSE,
n.cores = 1,
nthread = rtCores,
outdir = NULL,
save.mod = ifelse(!is.null(outdir), TRUE, FALSE),
.gs = FALSE,
...
)
x |
Numeric vector or matrix / data frame of features i.e. independent variables |
y |
Numeric vector of outcome, i.e. dependent variable |
x.test |
Numeric vector or matrix / data frame of testing set features
Columns must correspond to columns in |
y.test |
Numeric vector of testing set outcome |
x.name |
Character: Name for feature set |
y.name |
Character: Name for outcome |
booster |
Character: "gbtree", "gblinear": Booster to use. |
missing |
String or Numeric: Which values to consider as missing. |
nrounds |
Integer: Maximum number of rounds to run. Can be set to a high number as early stopping will limit nrounds by monitoring inner CV error |
force.nrounds |
Integer: Number of rounds to run if not estimating optimal number by CV |
weights |
Numeric vector: Weights for cases. For classification, |
ifw |
Logical: If TRUE, apply inverse frequency weighting
(for Classification only).
Note: If |
ifw.type |
Integer 0, 1, 2 1: class.weights as in 0, divided by min(class.weights) 2: class.weights as in 0, divided by max(class.weights) |
upsample |
Logical: If TRUE, upsample cases to balance outcome classes (for Classification only) Note: upsample will randomly sample with replacement if the length of the majority class is more than double the length of the class you are upsampling, thereby introducing randomness |
downsample |
Logical: If TRUE, downsample majority class to match size of minority class |
resample.seed |
Integer: If provided, will be used to set the seed during upsampling. Default = NULL (random seed) |
obj |
Function: Custom objective function. See |
feval |
Function: Custom evaluation function. See |
xgb.verbose |
Integer: Verbose level for XGB learners used for tuning. |
print_every_n |
Integer: Print evaluation metrics every this many iterations |
early_stopping_rounds |
Integer: Training on resamples of |
eta |
[gS] Numeric (0, 1): Learning rate. |
gamma |
[gS] Numeric: Minimum loss reduction required to make further partition |
max_depth |
[gS] Integer: Maximum tree depth. |
min_child_weight |
[gS] Numeric: Minimum sum of instance weight needed in a child. |
max_delta_step |
[gS] Numeric: Maximum delta step we allow each leaf output to be. O means no constraint. 1-10 may help control the update, especially with imbalanced outcomes. |
subsample |
[gS] Numeric: subsample ratio of the training instance |
colsample_bytree |
[gS] Numeric: subsample ratio of columns when constructing each tree |
colsample_bylevel |
[gS] Numeric |
lambda |
[gS] L2 regularization on weights |
alpha |
[gS] L1 regularization on weights |
tree_method |
[gS] XGBoost tree construction algorithm |
sketch_eps |
[gS] Numeric (0, 1): |
num_parallel_tree |
Integer: N of trees to grow in parallel: Results in Random Forest -like algorithm. (Default = 1; i.e. regular boosting) |
base_score |
Numeric: The mean outcome response. |
objective |
(Default = NULL) |
sample_type |
Character: Type of sampling algorithm for |
normalize_type |
Character. |
rate_drop |
[gS] Numeric: Dropout rate for |
one_drop |
[gS] Integer 0, 1: When this flag is enabled, at least one tree is always dropped during the dropout. |
skip_drop |
[gS] Numeric [0, 1]: Probability of skipping the dropout
procedure during a boosting iteration. If a dropout is skipped, new trees are added
in the same manner as gbtree. Non-zero |
grid.resample.params |
List: Output of setup.resample defining grid search parameters. |
gridsearch.type |
Character: Type of grid search to perform: "exhaustive" or "randomized". |
metric |
Character: Metric to minimize, or maximize if
|
maximize |
Logical: If TRUE, |
importance |
Logical: If TRUE, calculate variable importance. |
print.plot |
Logical: if TRUE, produce plot using |
plot.fitted |
Logical: if TRUE, plot True (y) vs Fitted |
plot.predicted |
Logical: if TRUE, plot True (y.test) vs Predicted.
Requires |
plot.theme |
Character: "zero", "dark", "box", "darkbox" |
question |
Character: the question you are attempting to answer with this model, in plain language. |
verbose |
Logical: If TRUE, print summary to screen. |
grid.verbose |
Logical: Passed to |
trace |
Integer: If > 0, print parameter values to console. |
save.gridrun |
Logical: If TRUE, save grid search models. |
n.cores |
Integer: Number of cores to use. |
nthread |
Integer: Number of threads for xgboost using OpenMP. Only parallelize resamples
using |
outdir |
Path to output directory.
If defined, will save Predicted vs. True plot, if available,
as well as full model output, if |
save.mod |
Logical: If TRUE, save all output to an RDS file in |
.gs |
Internal use only |
... |
Additional arguments passed to |
[gS]: indicates parameter will be autotuned by grid search if multiple values are passed. Learn more about XGBoost's parameters here: http://xgboost.readthedocs.io/en/latest/parameter.html
rtMod
object
E.D. Gennatas
train_cv for external cross-validation
Other Supervised Learning:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_BRUTO()
,
s_BayesGLM()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GAM()
,
s_GBM()
,
s_GLM()
,
s_GLMNET()
,
s_GLMTree()
,
s_GLS()
,
s_H2ODL()
,
s_H2OGBM()
,
s_H2ORF()
,
s_HAL()
,
s_KNN()
,
s_LDA()
,
s_LM()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MARS()
,
s_MLRF()
,
s_NBayes()
,
s_NLA()
,
s_NLS()
,
s_NW()
,
s_PPR()
,
s_PolyMARS()
,
s_QDA()
,
s_QRNN()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_SDA()
,
s_SGD()
,
s_SPLS()
,
s_SVM()
,
s_TFN()
,
s_XRF()
Other Tree-based methods:
s_AdaBoost()
,
s_AddTree()
,
s_BART()
,
s_C50()
,
s_CART()
,
s_CTree()
,
s_EVTree()
,
s_GBM()
,
s_GLMTree()
,
s_H2OGBM()
,
s_H2ORF()
,
s_LMTree()
,
s_LightCART()
,
s_LightGBM()
,
s_MLRF()
,
s_RF()
,
s_RFSRC()
,
s_Ranger()
,
s_XRF()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.