mlr_learners_classif.catboost: Gradient Boosted Decision Trees Classification Learner

mlr_learners_classif.catboostR Documentation

Gradient Boosted Decision Trees Classification Learner

Description

Gradient boosting algorithm that also supports categorical data. Calls catboost::catboost.train() from package 'catboost'.

Dictionary

This Learner can be instantiated via lrn():

lrn("classif.catboost")

Meta Information

  • Task type: “classif”

  • Predict Types: “response”, “prob”

  • Feature Types: “numeric”, “factor”, “ordered”

  • Required Packages: mlr3, mlr3extralearners, catboost

Parameters

Id Type Default Levels Range
loss_function_twoclass character Logloss Logloss, CrossEntropy -
loss_function_multiclass character MultiClass MultiClass, MultiClassOneVsAll -
learning_rate numeric 0.03 [0.001, 1]
random_seed integer 0 [0, \infty)
l2_leaf_reg numeric 3 [0, \infty)
bootstrap_type character - Bayesian, Bernoulli, MVS, Poisson, No -
bagging_temperature numeric 1 [0, \infty)
subsample numeric - [0, 1]
sampling_frequency character PerTreeLevel PerTree, PerTreeLevel -
sampling_unit character Object Object, Group -
mvs_reg numeric - [0, \infty)
random_strength numeric 1 [0, \infty)
depth integer 6 [1, 16]
grow_policy character SymmetricTree SymmetricTree, Depthwise, Lossguide -
min_data_in_leaf integer 1 [1, \infty)
max_leaves integer 31 [1, \infty)
ignored_features untyped NULL -
one_hot_max_size untyped FALSE -
has_time logical FALSE TRUE, FALSE -
rsm numeric 1 [0.001, 1]
nan_mode character Min Min, Max -
fold_permutation_block integer - [1, 256]
leaf_estimation_method character - Newton, Gradient, Exact -
leaf_estimation_iterations integer - [1, \infty)
leaf_estimation_backtracking character AnyImprovement No, AnyImprovement, Armijo -
fold_len_multiplier numeric 2 [1.001, \infty)
approx_on_full_history logical TRUE TRUE, FALSE -
class_weights untyped - -
auto_class_weights character None None, Balanced, SqrtBalanced -
boosting_type character - Ordered, Plain -
boost_from_average logical - TRUE, FALSE -
langevin logical FALSE TRUE, FALSE -
diffusion_temperature numeric 10000 [0, \infty)
score_function character Cosine Cosine, L2, NewtonCosine, NewtonL2 -
monotone_constraints untyped - -
feature_weights untyped - -
first_feature_use_penalties untyped - -
penalties_coefficient numeric 1 [0, \infty)
per_object_feature_penalties untyped - -
model_shrink_rate numeric - (-\infty, \infty)
model_shrink_mode character - Constant, Decreasing -
target_border numeric - (-\infty, \infty)
border_count integer - [1, 65535]
feature_border_type character GreedyLogSum Median, Uniform, UniformAndQuantiles, MaxLogSum, MinEntropy, GreedyLogSum -
per_float_feature_quantization untyped - -
classes_count integer - [1, \infty)
thread_count integer 1 [-1, \infty)
task_type character CPU CPU, GPU -
devices untyped - -
logging_level character Silent Silent, Verbose, Info, Debug -
metric_period integer 1 [1, \infty)
train_dir untyped "catboost_info" -
model_size_reg numeric 0.5 [0, 1]
allow_writing_files logical FALSE TRUE, FALSE -
save_snapshot logical FALSE TRUE, FALSE -
snapshot_file untyped - -
snapshot_interval integer 600 [1, \infty)
simple_ctr untyped - -
combinations_ctr untyped - -
ctr_target_border_count integer - [1, 255]
counter_calc_method character Full SkipTest, Full -
max_ctr_complexity integer - [1, \infty)
ctr_leaf_count_limit integer - [1, \infty)
store_all_simple_ctr logical FALSE TRUE, FALSE -
final_ctr_computation_mode character Default Default, Skip -
verbose logical FALSE TRUE, FALSE -
ntree_start integer 0 [0, \infty)
ntree_end integer 0 [0, \infty)
early_stopping_rounds integer - [1, \infty)
eval_metric untyped - -
use_best_model logical - TRUE, FALSE -
iterations integer 1000 [1, \infty)

Installation

See https://catboost.ai/en/docs/concepts/r-installation.

Initial parameter values

  • logging_level:

    • Actual default: "Verbose"

    • Adjusted default: "Silent"

    • Reason for change: consistent with other mlr3 learners

  • thread_count:

    • Actual default: -1

    • Adjusted default: 1

    • Reason for change: consistent with other mlr3 learners

  • allow_writing_files:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

  • save_snapshot:

    • Actual default: TRUE

    • Adjusted default: FALSE

    • Reason for change: consistent with other mlr3 learners

Early stopping

Early stopping can be used to find the optimal number of boosting rounds. Set early_stopping_rounds to an integer value to monitor the performance of the model on the validation set while training. For information on how to configure the validation set, see the Validation section of mlr3::Learner.

Super classes

mlr3::Learner -> mlr3::LearnerClassif -> LearnerClassifCatboost

Active bindings

internal_valid_scores

The last observation of the validation scores for all metrics. Extracted from model$evaluation_log

internal_tuned_values

Returns the early stopped iterations if early_stopping_rounds was set during training.

validate

How to construct the internal validation data. This parameter can be either NULL, a ratio, "test", or "predefined".

Methods

Public methods

Inherited methods

Method new()

Create a LearnerClassifCatboost object.

Usage
LearnerClassifCatboost$new()

Method importance()

The importance scores are calculated using catboost.get_feature_importance, setting type = "FeatureImportance", returned for 'all'.

Usage
LearnerClassifCatboost$importance()
Returns

Named numeric().


Method clone()

The objects of this class are cloneable with this method.

Usage
LearnerClassifCatboost$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Author(s)

sumny

References

Dorogush, Veronika A, Ershov, Vasily, Gulin, Andrey (2018). “CatBoost: gradient boosting with categorical features support.” arXiv preprint arXiv:1810.11363.

See Also

Examples


# Define the Learner
learner = mlr3::lrn("classif.catboost",
  iterations = 100)

print(learner)

# Define a Task
task = tsk("sonar")

# Create train and test set
ids = mlr3::partition(task)

# Train the learner on the training ids
learner$train(task, row_ids = ids$train)

print(learner$model)
print(learner$importance)

# Make predictions for the test rows
predictions = learner$predict(task, row_ids = ids$test)

# Score the predictions
predictions$score()


mlr-org/mlr3extralearners documentation built on Dec. 21, 2024, 2:21 p.m.