light_importance: Permutation Variable Importance

View source: R/light_importance.R

light_importanceR Documentation

Permutation Variable Importance

Description

Importance of variable v is measured as drop in performance by permuting the values of v, see Fisher et al. 2018 (reference below).

Usage

light_importance(x, ...)

## Default S3 method:
light_importance(x, ...)

## S3 method for class 'flashlight'
light_importance(
  x,
  data = x$data,
  by = x$by,
  type = c("permutation", "shap"),
  v = NULL,
  n_max = Inf,
  seed = NULL,
  m_repetitions = 1L,
  metric = x$metrics[1L],
  lower_is_better = TRUE,
  use_linkinv = FALSE,
  ...
)

## S3 method for class 'multiflashlight'
light_importance(x, ...)

Arguments

x

An object of class "flashlight" or "multiflashlight".

...

Further arguments passed to light_performance().

data

An optional data.frame.

by

An optional vector of column names used to additionally group the results.

type

Type of importance: "permutation" (currently the only option).

v

Vector of variable names to assess importance for. Defaults to all variables in data except "by" and "y".

n_max

Maximum number of rows to consider.

seed

An integer random seed used to select and shuffle rows.

m_repetitions

Number of permutations. Defaults to 1. A value above 1 provides more stable estimates of variable importance and allows the calculation of standard errors measuring the uncertainty from permuting.

metric

An optional named list of length one with a metric as element. Defaults to the first metric in the flashlight. The metric needs to be a function with at least four arguments: actual, predicted, case weights w and ....

lower_is_better

Logical flag indicating if lower values in the metric are better or not. If set to FALSE, the increase in metric is multiplied by -1.

use_linkinv

Should retransformation function be applied? Default is FALSE.

Details

The minimum required elements in the (multi-)flashlight are "y", "predict_function", "model", "data" and "metrics".

Value

An object of class "light_importance" with the following elements:

  • data A tibble with results.

  • by Same as input by.

  • type Same as input type. For information only.

Methods (by class)

  • light_importance(default): Default method not implemented yet.

  • light_importance(flashlight): Variable importance for a flashlight.

  • light_importance(multiflashlight): Variable importance for a multiflashlight.

References

Fisher A., Rudin C., Dominici F. (2018). All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance. Arxiv.

See Also

most_important(), plot.light_importance()

Examples

fit_part <- lm(Sepal.Length ~ Species + Petal.Length, data = iris)
fl_part <- flashlight(
  model = fit_part, label = "part", data = iris, y = "Sepal.Length"
)

# No effect of some variables (incl. standard errors)
plot(light_importance(fl_part, m_repetitions = 4), fill = "chartreuse4")

# Second model includes all variables
fit_full <- lm(Sepal.Length ~ ., data = iris)
fl_full <- flashlight(
  model = fit_full, label = "full", data = iris, y = "Sepal.Length"
)
fls <- multiflashlight(list(fl_part, fl_full))

plot(light_importance(fls), fill = "chartreuse4")
plot(light_importance(fls, by = "Species"))

mayer79/flashlight documentation built on Feb. 13, 2024, 1:09 p.m.