callbacks |
List of callback functions that are applied at each iteration.
|
data |
a gpb.Dataset object, used for training. Some functions, such as gpb.cv ,
may allow you to pass other types of data like matrix and then separately supply
label as a keyword argument.
|
folds |
list provides a possibility to use a list of pre-defined CV folds
(each element must be a vector of test fold's indices). When folds are supplied,
the nfold and stratified parameters are ignored.
|
nfold |
the original dataset is randomly partitioned into nfold equal size subsamples.
|
cv_seed |
Seed for generating folds when doing nfold CV
|
early_stopping_rounds |
int. Activates early stopping. Requires at least one validation data
and one metric. When this parameter is non-null,
training will stop if the evaluation of any metric on any validation set
fails to improve for early_stopping_rounds consecutive boosting rounds.
If training stops early, the returned model will have attribute best_iter
set to the iteration number of the best iteration.
|
metric |
Evaluation metric to be monitored when doing CV and parameter tuning.
Can be a character string or vector of character strings.
If not NULL, the metric in params will be overridden.
Non-exhaustive list of supported metrics: "test_neg_log_likelihood", "mse", "rmse", "mae",
"auc", "average_precision", "binary_logloss", "binary_error".
See
the "metric" section of the parameter documentation for a complete list of valid metrics.
|
verbose_eval |
integer . Whether to display information on the progress of tuning parameter choice.
If None or 0, verbose is of.
If = 1, summary progress information is displayed for every parameter combination.
If >= 2, detailed progress is displayed at every boosting stage for every parameter combination.
|
eval |
Evaluation metric to be monitored when doing CV and parameter tuning.
This can be a string, function, or list with a mixture of strings and functions.
a. character vector:
Non-exhaustive list of supported metrics: "test_neg_log_likelihood", "mse", "rmse", "mae",
"auc", "average_precision", "binary_logloss", "binary_error"
See
the "metric" section of the parameter documentation
for a complete list of valid metrics.
b. function:
You can provide a custom evaluation function. This
should accept the keyword arguments preds and dtrain and should return a named
list with three elements:
name : A string with the name of the metric, used for printing
and storing results.
value : A single number indicating the value of the metric for the
given predictions and true values
-
higher_better : A boolean indicating whether higher values indicate a better fit.
For example, this would be FALSE for metrics like MAE or RMSE.
c. list:
If a list is given, it should only contain character vectors and functions.
These should follow the requirements from the descriptions above.
|
eval_freq |
evaluation output frequency, only effect when verbose > 0
|
valids |
a list of gpb.Dataset objects, used for validation
|
record |
Boolean, TRUE will record iteration message to booster$record_evals
|
colnames |
feature names, if not null, will use this to overwrite the names in dataset
|
categorical_feature |
categorical features. This can either be a character vector of feature
names or an integer vector with the indices of the features (e.g.
c(1L, 10L) to say "the first and tenth columns").
|
init_model |
path of model file of gpb.Booster object, will continue training from this model
|
nrounds |
number of boosting iterations (= number of trees). This is the most important tuning parameter for boosting
|
obj |
(character) The distribution of the response variable (=label) conditional on fixed and random effects.
This only needs to be set when doing independent boosting without random effects / Gaussian processes.
|
params |
list of "tuning" parameters.
See the parameter documentation for more information.
A few key parameters:
learning_rate : The learning rate, also called shrinkage or damping parameter
(default = 0.1). An important tuning parameter for boosting. Lower values usually
lead to higher predictive accuracy but more boosting iterations are needed
num_leaves : Number of leaves in a tree. Tuning parameter for
tree-boosting (default = 31)
max_depth : Maximal depth of a tree. Tuning parameter for tree-boosting (default = no limit)
min_data_in_leaf : Minimal number of samples per leaf. Tuning parameter for
tree-boosting (default = 20)
lambda_l2 : L2 regularization (default = 0)
lambda_l1 : L1 regularization (default = 0)
max_bin : Maximal number of bins that feature values will be bucketed in (default = 255)
line_search_step_length (default = FALSE): If TRUE, a line search is done to find the optimal
step length for every boosting update (see, e.g., Friedman 2001). This is then multiplied by the learning rate
train_gp_model_cov_pars (default = TRUE): If TRUE, the covariance parameters of the Gaussian process
are estimated in every boosting iterations, otherwise the gp_model parameters are not estimated.
In the latter case, you need to either estimate them beforehand or provide values via
the 'init_cov_pars' parameter when creating the gp_model
use_gp_model_for_validation (default = TRUE): If TRUE, the Gaussian process is also used
(in addition to the tree model) for calculating predictions on the validation data
leaves_newton_update (default = FALSE): Set this to TRUE to do a Newton update step for the tree leaves
after the gradient step. Applies only to Gaussian process boosting (GPBoost algorithm)
num_threads: Number of threads. For the best speed, set this to
the number of real CPU cores(parallel::detectCores(logical = FALSE) ),
not the number of threads (most CPU using hyper-threading to generate 2 threads
per CPU core).
|
verbose |
verbosity for output, if <= 0, also will disable the print of evaluation during training
|
gp_model |
A GPModel object that contains the random effects (Gaussian process and / or grouped random effects) model
|
line_search_step_length |
Boolean. If TRUE, a line search is done to find the optimal step length for every boosting update
(see, e.g., Friedman 2001). This is then multiplied by the learning_rate .
Applies only to the GPBoost algorithm
|
use_gp_model_for_validation |
Boolean. If TRUE, the gp_model
(Gaussian process and/or random effects) is also used (in addition to the tree model) for calculating
predictions on the validation data. If FALSE, the gp_model (random effects part) is ignored
for making predictions and only the tree ensemble is used for making predictions for calculating the validation / test error.
|
train_gp_model_cov_pars |
Boolean. If TRUE, the covariance parameters
of the gp_model (Gaussian process and/or random effects) are estimated in every
boosting iterations, otherwise the gp_model parameters are not estimated.
In the latter case, you need to either estimate them beforehand or provide the values via
the init_cov_pars parameter when creating the gp_model
|