In topepo/parsnip: A Common API to Modeling and Analysis Functions

r descr_models("boost_tree", "h2o")

Tuning Parameters

defaults <- 
  tibble::tibble(parsnip = c("mtry", "trees", "tree_depth", "learn_rate", "sample_size", "min_n",  "loss_reduction", "stop_iter"),
                 default = c(1, 50L, 6, 0.3, 1, 1, 0, 0))

# For this model, this is the same for all modes
param <-
 boost_tree() %>% 
  set_engine("h2o") %>% 
  set_mode("regression") %>% 
  make_parameter_list(defaults)

This model has r nrow(param) tuning parameters:

param$item

min_n represents the fewest allowed observations in a terminal node, [h2o::h2o.xgboost()] allows only one row in a leaf by default.

stop_iter controls early stopping rounds based on the convergence of the engine parameter stopping_metric. By default, [h2o::h2o.xgboost()] does not use early stopping. When stop_iter is not 0, [h2o::h2o.xgboost()] uses logloss for classification, deviance for regression and anonomaly score for Isolation Forest. This is mostly useful when used alongside the engine parameter validation, which is the proportion of train-validation split, parsnip will split and pass the two data frames to h2o. Then [h2o::h2o.xgboost()] will evaluate the metric and early stopping criteria on the validation set.

Translation from parsnip to the original package (regression)

[agua::h2o_train_xgboost()] is a wrapper around [h2o::h2o.xgboost()].

r uses_extension("boost_tree", "h2o", "regression")

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric(), stop_iter = integer()
) %>%
  set_engine("h2o") %>%
  set_mode("regression") %>%
  translate()

Translation from parsnip to the original package (classification)

r uses_extension("boost_tree", "h2o", "classification")

boost_tree(
  mtry = integer(), trees = integer(), tree_depth = integer(), 
  learn_rate = numeric(), min_n = integer(), loss_reduction = numeric(), stop_iter = integer()
) %>% 
  set_engine("h2o") %>% 
  set_mode("classification") %>% 
  translate()

Preprocessing

Non-numeric predictors (i.e., factors) are internally converted to numeric. In the classification context, non-numeric outcomes (i.e., factors) are also internally converted to numeric.