light_importance: Variable Importance
In flashlight: Shed Light on Black Box Machine Learning Models

light_importance

R Documentation

Variable Importance

Description

Two algorithms to calculate variable importance are available:

Permutation importance, and
SHAP importance

Algorithm 1 measures importance of variable v as the drop in performance by permuting the values of v, see Fisher et al. 2018 (reference below). Algorithm 2 measures variable importance by averaging absolute SHAP values.

Usage

light_importance(x, ...)

## Default S3 method:
light_importance(x, ...)

## S3 method for class 'flashlight'
light_importance(
  x,
  data = x$data,
  by = x$by,
  type = c("permutation", "shap"),
  v = NULL,
  n_max = Inf,
  seed = NULL,
  m_repetitions = 1L,
  metric = x$metrics[1L],
  lower_is_better = TRUE,
  use_linkinv = FALSE,
  ...
)

## S3 method for class 'multiflashlight'
light_importance(x, ...)

Arguments

`x`	An object of class "flashlight" or "multiflashlight".
`...`	Further arguments passed to `light_performance()`. Not used for `type = "shap"`.
`data`	An optional `data.frame`. Not used for `type = "shap"`.
`by`	An optional vector of column names used to additionally group the results.
`type`	Type of importance: "permutation" (default) or "shap". "shap" is only available if a "shap" object is contained in `x`.
`v`	Vector of variable names to assess importance for. Defaults to all variables in `data` except "by" and "y".
`n_max`	Maximum number of rows to consider. Not used for `type = "shap"`.
`seed`	An integer random seed used to select and shuffle rows. Not used for `type = "shap"`.
`m_repetitions`	Number of permutations. Defaults to 1. A value above 1 provides more stable estimates of variable importance and allows the calculation of standard errors measuring the uncertainty from permuting. Not used for `type = "shap"`.
`metric`	An optional named list of length one with a metric as element. Defaults to the first metric in the flashlight. The metric needs to be a function with at least four arguments: actual, predicted, case weights w and `...`. Irrelevant for `type = "shap"`.
`lower_is_better`	Logical flag indicating if lower values in the metric are better or not. If set to `FALSE`, the increase in metric is multiplied by -1. Not used for `type = "shap"`.
`use_linkinv`	Should retransformation function be applied? Default is `FALSE`. Not uses for `type = "shap"`.

Details

For Algorithm 1, the minimum required elements in the (multi-)flashlight are "y", "predict_function", "model", "data" and "metrics". For Algorithm 2, the only required element is "shap". Call add_shap() once to add such object.

Note: The values of the permutation Algorithm 1. are on the scale of the selected metric. For SHAP Algorithm 2, the values are on the scale of absolute values of the predictions.

Value

An object of class "light_importance" with the following elements:

data A tibble with results. Can be used to build fully customized visualizations. Column names can be controlled by options(flashlight.column_name).
by Same as input by.
type Same as input type. For information only.

Methods (by class)

light_importance(default): Default method not implemented yet.
light_importance(flashlight): Variable importance for a flashlight.
light_importance(multiflashlight): Variable importance for a multiflashlight.

References

Fisher A., Rudin C., Dominici F. (2018). All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance. Arxiv.

Examples

fit <- lm(Sepal.Length ~ Petal.Length, data = iris)
fl <- flashlight(model = fit, label = "full", data = iris, y = "Sepal.Length")
light_importance(fl)

flashlight documentation built on May 31, 2023, 6:19 p.m.