ebm: Explainable Boosting Machine (EBM)
In ebm: Explainable Boosting Machines

View source: R/ebm.R

ebm	R Documentation

Explainable Boosting Machine (EBM)

Description

This function is an R wrapper for the explainable boosting functions in the Python interpret library. It trains an Explainable Boosting Machine (EBM) model, which is a tree-based, cyclic gradient boosting generalized additive model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.

Usage

ebm(
  formula,
  data,
  max_bins = 1024L,
  max_interaction_bins = 64L,
  interactions = 0.9,
  exclude = NULL,
  validation_size = 0.15,
  outer_bags = 16L,
  inner_bags = 0L,
  learning_rate = 0.04,
  greedy_ratio = 10,
  cyclic_progress = FALSE,
  smoothing_rounds = 500L,
  interaction_smoothing_rounds = 100L,
  max_rounds = 25000L,
  early_stopping_rounds = 100L,
  early_stopping_tolerance = 1e-05,
  min_samples_leaf = 4L,
  min_hessian = 0,
  reg_alpha = 0,
  reg_lambda = 0,
  max_delta_step = 0,
  gain_scale = 5,
  min_cat_samples = 10L,
  cat_smooth = 10,
  missing = "separate",
  max_leaves = 2L,
  monotone_constraints = NULL,
  objective = c("auto", "log_loss", "rmse", "poisson_deviance",
    "tweedie_deviance:variance_power=1.5", "gamma_deviance", "pseudo_huber:delta=1.0",
    "rmse_log"),
  n_jobs = -1L,
  random_state = 42L,
  ...
)

Arguments

`formula`	A formula of the form `y ~ x1 + x2 + ...`.
`data`	A data frame containing the variables in the model.
`max_bins`	Max number of bins per feature for the main effects stage. Default is 1024.
`max_interaction_bins`	Max number of bins per feature for interaction terms. Default is 64.
`interactions`	Interaction terms to be included in the model. Default is 0.9. Current options include: Integer (1 <= interactions): Count of interactions to be automatically selected. Percentage (interactions < 1.0): Determine the integer count of interactions by multiplying the number of features by this percentage. List of numeric pairs: The pairs contain the indices of the features within each additive term. In addition to pairs, the interactions parameter accepts higher order interactions. It also accepts univariate terms which will cause the algorithm to boost the main terms at the same time as the interactions. When boosting mains at the same time as interactions, the `exclude` parameter should be set to `"mains"` and currently `max_bins` needs to be equal to `max_interaction_bins`.
`exclude`	Features or terms to be excluded. Default is `NULL`.
`validation_size`	Validation set size. Used for early stopping during boosting, and is needed to create outer bags. Default is 0.15. Options are: Integer (1 <= `validation_size`): Count of samples to put in the validation sets. Percentage (`validation_size` < 1.0): Percentage of the data to put in the validation sets. 0: Turns off early stopping. Outer bags have no utility. Error bounds will
`outer_bags`	Number of outer bags. Outer bags are used to generate error bounds and help with smoothing the graphs.
`inner_bags`	Number of inner bags. Default is 0 which turns off inner bagging.
`learning_rate`	Learning rate for boosting. Deafult is 0.04.
`greedy_ratio`	The proportion of greedy boosting steps relative to cyclic boosting steps. A value of 0 disables greedy boosting, effectively turning it off. Default is 10.
`cyclic_progress`	This parameter specifies the proportion of the boosting cycles that will actively contribute to improving the model's performance. It is expressed as a logical or numeric between 0 and 1, with the default set to `TRUE` (1.0), meaning 100% of the cycles are expected to make forward progress. If forward progress is not achieved during a cycle, that cycle will not be wasted; instead, it will be used to update internal gain calculations related to how effective each feature is in predicting the target variable. Setting this parameter to a value less than 1.0 can be useful for preventing overfitting. Default is `FALSE`.
`smoothing_rounds`	Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs. Default is 500.
`interaction_smoothing_rounds`	Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting. Default is 100.
`max_rounds`	Total number of boosting rounds with `n_terms` boosting steps per round. Default is 25000.
`early_stopping_rounds`	Number of rounds with no improvement to trigger early stopping. 0 turns off early stopping and boosting will occur for exactly `max_rounds`. Default is 100.
`early_stopping_tolerance`	Tolerance that dictates the smallest delta required to be considered an improvement which prevents the algorithm from early stopping. `early_stopping_tolerance` is expressed as a percentage of the early stopping metric. Negative values indicate that the individual models should be overfit before stopping. EBMs are a bagged ensemble of models. Setting the `early_stopping_tolerance` to zero (or even negative), allows learning to overfit each of the individual models a little, which can improve the accuracy of the ensemble as a whole. Overfitting each of the individual models reduces the bias of each model at the expense of increasing the variance (due to overfitting) of the individual models. But averaging the models in the ensemble reduces variance without much change in bias. Since the goal is to find the optimum bias-variance tradeoff for the ensemble of models—not the individual models—a small amount of overfitting of the individual models can improve the accuracy of the ensemble as a whole. Default is 1e-05.
`min_samples_leaf`	Minimum number of samples allowed in the leaves. Default is 4.
`min_hessian`	Minimum hessian required to consider a potential split valid. Default is 0.0.
`reg_alpha`	L1 regularization. Default is 0.0.
`reg_lambda`	L2 regularization. Default is 0.0.
`max_delta_step`	Used to limit the max output of tree leaves; <=0.0 means no constraint. Default is 0.0.
`gain_scale`	Scale factor to apply to nominal categoricals. A scale factor above 1.0 will cause the algorithm focus more on the nominal categoricals. Default is 5.0.
`min_cat_samples`	Minimum number of samples in order to treat a category separately. If lower than this threshold the category is combined with other categories that have low numbers of samples. Default is 10.
`cat_smooth`	Used for the categorical features. This can reduce the effect of noises in categorical features, especially for categories with limited data. Default is 10.0.
`missing`	Method for handling missing values during boosting. Default is `"separate"`. The placement of the missing value bin can influence the resulting model graphs. For example, placing the bin on the "low" side may cause missing values to affect lower bins, and vice versa. This parameter does not affect the final placement of the missing bin in the model (the missing bin will remain at index 0 in the `term_scores_` attribute). Possible values for missing are: `"low"`: Place the missing bin on the left side of the graphs. `"high"`: Place the missing bin on the right side of the graphs. `"separate"`: Place the missing bin in its own leaf during each boosting step, effectively making it location-agnostic. This can lead to overfitting, especially when the proportion of missing values is small. `"gain"`: Choose the best leaf for the missing value contribution at each boosting step, based on gain.
`max_leaves`	Maximum number of leaves allowed in each tree. Default is 2.
`monotone_constraints`	Default is NULL. This parameter allows you to specify monotonic constraints for each feature's relationship with the target variable during model fitting. However, it is generally recommended to apply monotonic constraints post-fit using the `monotonize()` attribute rather than setting them during the fitting process. This recommendation is based on the observation that, during fitting, the boosting algorithm may compensate for a monotone constraint on one feature by utilizing another correlated feature, potentially obscuring any monotonic violations. If you choose to define monotone constraints, `monotone_constraints` should be a numeric vector with a length equal to the number of features. Each element in the list corresponds to a feature and should take one of the following values: 0: No monotonic constraint is imposed on the corresponding feature's partial response. +1: The partial response of the corresponding feature should be monotonically increasing with respect to the target. -1: The partial response of the corresponding feature should be monotonically decreasing with respect to the target.
`objective`	The objective function to optimize. Current options include: `"auto"` (try to determine automatically between `"log_loss"` and `"rmse"`). `"rmse"` (root mean squared error). `"poisson_deviance"` (e.g., for counts or non-negative integers). `"tweedie_deviance:variance_power=1.5"` (e.g., for modeling total loss in insurance applications). `"gamma_deviance"` (e.g., for positive continuous response). `"pseudo_huber:delta=1.0"` (e.g., for robust regression). `"rmse_log"` (`"rmse"` with a log link function). Default is `"auto"` which assumes `"log_loss"` if the response is a factor or character string and `"rmse"` otherwise. It's a good idea to always explicitly set this argument.
`n_jobs`	Number of jobs to run in parallel. Default is -1. Negative integers are interpreted as following joblib's formula (`n_cpus + 1 + n_jobs`), just like scikit-learn. For example, `n_jobs = -2` means using all threads except 1.
`random_state`	Random state. Setting to `NULL` generates non-repeatable sequences. Default is 42 to remain consistent with the corresponding Python module.
`...`	Additional optional argument. (Currently ignored.)

Details

In short, EBMs have the general form

E\left[g\left(Y|\boldsymbol{x}\right)\right] = \theta_0 + \sum_if_i\left(x_i\right) + \sum_{ij}f_{ij}\left(x_i, x_j\right) \quad \left(i \ne j\right),

where,

g is a link function that allows the model to handle various response types (e.g., the logit link for logistic regression or Poisson deviance for modeling counts and rates);
\theta_0 is a constant intercept (or bias term); ?
f_i is the term contribution (or shape function) for predictor x_i (i.e., it captures the main effect of x_i on E\left[Y|\boldsymbol{x}\right]);
f_{ij} is the term contribution for the pair of predictors x_i and x_j (i.e., it captures the joint effect, or pairwise interaction effect of x_i and x_j on E\left[Y|\boldsymbol{x}\right]).

Value

An object of class "EBM" for which there are print, predict, plot, and merge methods.

Examples

## Not run: 
  #
  # Regression example
  #

  # Fit a default EBM regressor
  fit <- ebm(mpg ~ ., data = mtcars, objective = "rmse")

  # Generate some predictions
  head(predict(fit, newdata = mtcars))
  head(predict(fit, newdata = mtcars, se_fit = TRUE))

  # Show global summary and GAM shape functions
  plot(fit)  # term importance scores
  plot(fit, term = "cyl")
  plot(fit, term = "cyl", interactive = TRUE)

  # Explain prediction for first observation
  plot(fit, local = TRUE, X = subset(mtcars, select = -mpg)[1L, ])

## End(Not run)

ebm documentation built on April 3, 2025, 7:16 p.m.