vint: Interaction effects

Description Usage Arguments Details References Examples

View source: R/vint.R

Description

Quantify the strength of two-way interaction effects using a simple feature importance ranking measure (FIRM) approach. For details, see Greenwell et al. (2018).

Usage

1
2
3
4
5
6
7
8
vint(
  object,
  feature_names,
  progress = "none",
  parallel = FALSE,
  paropts = NULL,
  ...
)

Arguments

object

A fitted model object (e.g., a "randomForest" object).

feature_names

Character string giving the names of the two features of interest.

progress

Character string giving the name of the progress bar to use while constructing the interaction statistics. See create_progress_bar for details. Default is "none".

parallel

Logical indicating whether or not to run partial in parallel using a backend provided by the foreach package. Default is FALSE.

paropts

List containing additional options to be passed on to foreach when parallel = TRUE.

...

Additional optional arguments to be passed on to partial.

Details

This function quantifies the strength of interaction between features $X_1$ and $X_2$ by measuring the change in variance along slices of the partial dependence of $X_1$ and $X_2$ on the target $Y$. See Greenwell et al. (2018) for details and examples.

References

Greenwell, B. M., Boehmke, B. C., and McCarthy, A. J.: A Simple and Effective Model-Based Variable Importance Measure. arXiv preprint arXiv:1805.04755 (2018).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
## Not run: 
#
# The Friedman 1 benchmark problem
#

# Load required packages
library(gbm)
library(ggplot2)
library(mlbench)

# Simulate training data
trn <- gen_friedman(500, seed = 101)  # ?vip::gen_friedman

#
# NOTE: The only interaction that actually occurs in the model from which
# these data are generated is between x.1 and x.2!
#

# Fit a GBM to the training data
set.seed(102)  # for reproducibility
fit <- gbm(y ~ ., data = trn, distribution = "gaussian", n.trees = 1000,
           interaction.depth = 2, shrinkage = 0.01, bag.fraction = 0.8,
           cv.folds = 5)
best_iter <- gbm.perf(fit, plot.it = FALSE, method = "cv")

# Quantify relative interaction strength
all_pairs <- combn(paste0("x.", 1:10), m = 2)
res <- NULL
for (i in seq_along(all_pairs)) {
  interact <- vint(fit, feature_names = all_pairs[, i], n.trees = best_iter)
  res <- rbind(res, interact)
}

# Plot top 20 results
top_20 <- res[1L:20L, ]
ggplot(top_20, aes(x = reorder(Variables, Interaction), y = Interaction)) +
  geom_col() +
  coord_flip() +
  xlab("") +
  ylab("Interaction strength")

## End(Not run)

vip documentation built on Dec. 17, 2020, 5:08 p.m.