ebm | R Documentation |
This function is an R wrapper for the explainable boosting functions in the Python interpret library. It trains an Explainable Boosting Machine (EBM) model, which is a tree-based, cyclic gradient boosting generalized additive model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.
ebm(
formula,
data,
max_bins = 1024L,
max_interaction_bins = 64L,
interactions = 0.9,
exclude = NULL,
validation_size = 0.15,
outer_bags = 16L,
inner_bags = 0L,
learning_rate = 0.04,
greedy_ratio = 10,
cyclic_progress = FALSE,
smoothing_rounds = 500L,
interaction_smoothing_rounds = 100L,
max_rounds = 25000L,
early_stopping_rounds = 100L,
early_stopping_tolerance = 1e-05,
min_samples_leaf = 4L,
min_hessian = 0,
reg_alpha = 0,
reg_lambda = 0,
max_delta_step = 0,
gain_scale = 5,
min_cat_samples = 10L,
cat_smooth = 10,
missing = "separate",
max_leaves = 2L,
monotone_constraints = NULL,
objective = c("auto", "log_loss", "rmse", "poisson_deviance",
"tweedie_deviance:variance_power=1.5", "gamma_deviance", "pseudo_huber:delta=1.0",
"rmse_log"),
n_jobs = -1L,
random_state = 42L,
...
)
formula |
A formula of the form |
data |
A data frame containing the variables in the model. |
max_bins |
Max number of bins per feature for the main effects stage. Default is 1024. |
max_interaction_bins |
Max number of bins per feature for interaction terms. Default is 64. |
interactions |
Interaction terms to be included in the model. Default is 0.9. Current options include:
|
exclude |
Features or terms to be excluded. Default is |
validation_size |
Validation set size. Used for early stopping during boosting, and is needed to create outer bags. Default is 0.15. Options are:
|
outer_bags |
Number of outer bags. Outer bags are used to generate error bounds and help with smoothing the graphs. |
inner_bags |
Number of inner bags. Default is 0 which turns off inner bagging. |
learning_rate |
Learning rate for boosting. Deafult is 0.04. |
greedy_ratio |
The proportion of greedy boosting steps relative to cyclic boosting steps. A value of 0 disables greedy boosting, effectively turning it off. Default is 10. |
cyclic_progress |
This parameter specifies the proportion of the
boosting cycles that will actively contribute to improving the model's
performance. It is expressed as a logical or numeric between 0 and 1, with
the default set to |
smoothing_rounds |
Number of initial highly regularized rounds to set the basic shape of the main effect feature graphs. Default is 500. |
interaction_smoothing_rounds |
Number of initial highly regularized rounds to set the basic shape of the interaction effect feature graphs during fitting. Default is 100. |
max_rounds |
Total number of boosting rounds with |
early_stopping_rounds |
Number of rounds with no improvement to trigger
early stopping. 0 turns off early stopping and boosting will occur for
exactly |
early_stopping_tolerance |
Tolerance that dictates the smallest delta
required to be considered an improvement which prevents the algorithm from
early stopping. |
min_samples_leaf |
Minimum number of samples allowed in the leaves. Default is 4. |
min_hessian |
Minimum hessian required to consider a potential split valid. Default is 0.0. |
reg_alpha |
L1 regularization. Default is 0.0. |
reg_lambda |
L2 regularization. Default is 0.0. |
max_delta_step |
Used to limit the max output of tree leaves; <=0.0 means no constraint. Default is 0.0. |
gain_scale |
Scale factor to apply to nominal categoricals. A scale factor above 1.0 will cause the algorithm focus more on the nominal categoricals. Default is 5.0. |
min_cat_samples |
Minimum number of samples in order to treat a category separately. If lower than this threshold the category is combined with other categories that have low numbers of samples. Default is 10. |
cat_smooth |
Used for the categorical features. This can reduce the effect of noises in categorical features, especially for categories with limited data. Default is 10.0. |
missing |
Method for handling missing values during boosting. Default is
|
max_leaves |
Maximum number of leaves allowed in each tree. Default is 2. |
monotone_constraints |
Default is NULL. This parameter allows you to
specify monotonic constraints for each feature's relationship with the target
variable during model fitting. However, it is generally recommended to apply
monotonic constraints post-fit using the
|
objective |
The objective function to optimize. Current options include:
Default is |
n_jobs |
Number of jobs to run in parallel. Default is -1. Negative
integers are interpreted as following
joblib's formula ( |
random_state |
Random state. Setting to |
... |
Additional optional argument. (Currently ignored.) |
In short, EBMs have the general form
E\left[g\left(Y|\boldsymbol{x}\right)\right] = \theta_0 + \sum_if_i\left(x_i\right) + \sum_{ij}f_{ij}\left(x_i, x_j\right) \quad \left(i \ne j\right),
where,
g
is a link function that allows the model to handle various response
types (e.g., the logit link for logistic regression or Poisson deviance for
modeling counts and rates);
\theta_0
is a constant intercept (or bias term);
?
f_i
is the term contribution (or shape function) for predictor
x_i
(i.e., it captures the main effect of x_i
on
E\left[Y|\boldsymbol{x}\right]
);
f_{ij}
is the term contribution for the pair of predictors x_i
and x_j
(i.e., it captures the joint effect, or pairwise interaction
effect of x_i
and x_j
on E\left[Y|\boldsymbol{x}\right]
).
An object of class "EBM"
for which there are print,
predict, plot, and merge methods.
## Not run:
#
# Regression example
#
# Fit a default EBM regressor
fit <- ebm(mpg ~ ., data = mtcars, objective = "rmse")
# Generate some predictions
head(predict(fit, newdata = mtcars))
head(predict(fit, newdata = mtcars, se_fit = TRUE))
# Show global summary and GAM shape functions
plot(fit) # term importance scores
plot(fit, term = "cyl")
plot(fit, term = "cyl", interactive = TRUE)
# Explain prediction for first observation
plot(fit, local = TRUE, X = subset(mtcars, select = -mpg)[1L, ])
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.