ale | R Documentation |
Calculates ALE for one or multiple continuous features specified by X
.
The concept of ALE was introduced in Apley et al. (2020) as an alternative to partial dependence (PD). The Ceteris Paribus clause behind PD is a blessing and a curse at the same time:
Blessing: The interpretation is easy and similar to what we know from linear regression (just averaging out interaction effects).
Curse: The model is applied to very unlikely or even impossible feature combinations, especially with strongly dependent features.
ALE fixes the curse as follows: Per bin, the local effect is calculated as the partial dependence difference between lower and upper bin break, using only observations falling into this bin. This is repeated for all bins, and the values are accumulated.
ALE values are plotted against right bin breaks.
ale(object, ...)
## Default S3 method:
ale(
object,
v,
data,
pred_fun = stats::predict,
trafo = NULL,
which_pred = NULL,
w = NULL,
breaks = "Sturges",
right = TRUE,
discrete_m = 13L,
outlier_iqr = 2,
ale_n = 50000L,
ale_bin_size = 200L,
seed = NULL,
...
)
## S3 method for class 'ranger'
ale(
object,
v,
data,
pred_fun = NULL,
trafo = NULL,
which_pred = NULL,
w = NULL,
breaks = "Sturges",
right = TRUE,
discrete_m = 13L,
outlier_iqr = 2,
ale_n = 50000L,
ale_bin_size = 200L,
seed = NULL,
...
)
## S3 method for class 'explainer'
ale(
object,
v = colnames(data),
data = object$data,
pred_fun = object$predict_function,
trafo = NULL,
which_pred = NULL,
w = object$weights,
breaks = "Sturges",
right = TRUE,
discrete_m = 13L,
outlier_iqr = 2,
ale_n = 50000L,
ale_bin_size = 200L,
seed = NULL,
...
)
## S3 method for class 'H2OModel'
ale(
object,
data,
v = object@parameters$x,
pred_fun = NULL,
trafo = NULL,
which_pred = NULL,
w = object@parameters$weights_column$column_name,
breaks = "Sturges",
right = TRUE,
discrete_m = 13L,
outlier_iqr = 2,
ale_n = 50000L,
ale_bin_size = 200L,
seed = NULL,
...
)
object |
Fitted model. |
... |
Further arguments passed to |
v |
Variable names to calculate statistics for. |
data |
Matrix or data.frame. |
pred_fun |
Prediction function, by default |
trafo |
How should predictions be transformed?
A function or |
which_pred |
If the predictions are multivariate: which column to pick
(integer or column name). By default |
w |
Optional vector with case weights. Can also be a column name in |
breaks |
An integer, vector, or "Sturges" (the default) used to determine
bin breaks of continuous features. Values outside the total bin range are placed
in the outmost bins. To allow varying values of |
right |
Should bins be right-closed? The default is |
discrete_m |
Numeric features with up to this number of unique values are treated as discrete and are therefore dropped from the calculations. |
outlier_iqr |
If |
ale_n |
Size of the data used for calculating ALE.
The default is 50000. For larger |
ale_bin_size |
Maximal number of observations used per bin for ALE calculations.
If there are more observations in a bin, |
seed |
Optional integer random seed used for:
|
The function is a convenience wrapper around feature_effects()
, which calls
the barebone implementation .ale()
to calculate ALE.
A list (of class "EffectData") with a data.frame per feature having columns:
bin_mid
: Bin mid points. In the plots, the bars are centered around these.
bin_width
: Absolute width of the bin. In the plots, these equal the bar widths.
bin_mean
: For continuous features, the (possibly weighted) average feature
value within bin. For discrete features equivalent to bin_mid
.
N
: The number of observations within bin.
weight
: The weight sum within bin. When w = NULL
, equivalent to N
.
Different statistics, depending on the function call.
Use single bracket subsetting to select part of the output. Note that each data.frame contains an attribute "discrete" with the information whether the feature is discrete or continuous. This attribute might be lost when you manually modify the data.frames.
ale(default)
: Default method.
ale(ranger)
: Method for ranger models.
ale(explainer)
: Method for DALEX explainers
ale(H2OModel)
: Method for H2O models
Apley, Daniel W., and Jingyu Zhu. 2020. Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 82 (4): 1059–1086. doi:10.1111/rssb.12377.
feature_effects()
, .ale()
fit <- lm(Sepal.Length ~ ., data = iris)
M <- ale(fit, v = "Petal.Length", data = iris)
M |> plot()
M2 <- ale(fit, v = colnames(iris)[-1], data = iris, breaks = 5)
plot(M2, share_y = "all") # Only continuous variables shown
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.