shapFlex: Compute symmetric or asymmetric stochastic feature-level...

Description Usage Arguments Value

View source: R/shapFlex.R

Description

This function uses user-defined trained models and prediction functions to compute approximate Shapley values for single models. Shapley values can be calculated for a subset of model features which reduces the typically expensive computation of approximate Shapley values in high-dimensional models.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
shapFlex(
  explain,
  reference = NULL,
  model,
  predict_function,
  target_features = NULL,
  causal = NULL,
  causal_weights = NULL,
  sample_size = 60,
  use_future = FALSE
)

Arguments

explain

A data.frame of instances to be explained using Shapley values. explain is passed internally as a data.frame to predict_function.

reference

Optional. A data.frame with the same format as explain–with possibly more or fewer rows–of instances which serve as a reference group against which the Shapley value deviations from explain are compared. That is, reference is used to calculate an average prediction or intercept value. reference is passed internally as a data.frame to predict_function.

model

A trained prediction model object used to compute Shapley values. model is passed internally to predict_function.

predict_function

A predict()-type wrapper function that takes 2 required positional arguments–(1) the trained model from model and (2) a data.frame of instances with the same format as explain. For numeric outcomes, the function should return() a 1-column data.frame of model predictions; the column name does not matter.

target_features

Optional. A character vector that is a subset of feature names in explain for which Shapley values will be computed. For high-dimensional models, selecting a subset of interesting features may dramatically speed up computation time. The default behavior is to return Shapley values for all instances and features in explain.

causal

Optional. A 2-column data.frame of feature names: The 1st column gives causes, the 2nd column gives effects.

causal_weights

Optional. A numeric vector of nrow(causal) with weights between 0 and 1 that specifies the strength of the causal asymmetric Shapley values. A weight of 1–the default if causal_weights = NULL–estimates a pure causal effect where the instance to be explained is always conditioned on its true/actual values in the Monte Carlo sampling. A weight of .5 is equivalent to the symmetric Shapley value calculation–within sampling error.

sample_size

A numeric vector of length 1 giving the number of Monte Carlo samples used to compute the stochastic Shapley values for each feature.

use_future

Boolean. If TRUE, the future package is used to calculate Shapley values in parallel across sample_size.

Value

A data.frame with class shapFlex of the feature-level Shapley values for all instaces in explain.


nredell/shapFlex documentation built on June 11, 2020, 4:40 a.m.