apply_lime: Apply Different Implementations of LIME

Description Usage Arguments Examples

View source: R/main-apply_lime.R

Description

Applies LIME with the specified tuning parameter options.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
apply_lime(
  train,
  test,
  model,
  sim_method,
  nbins = 4,
  label,
  n_features,
  n_permutations = 5000,
  feature_select = "auto",
  dist_fun = "gower",
  kernel_width = NULL,
  gower_pow = 1,
  all_fs = FALSE,
  return_perms = FALSE,
  parallel = FALSE,
  seed = NULL
)

Arguments

train

Dataframe of training data features.

test

Dataframe of testing data features.

model

Complex model to be explained.

sim_method

Vector of methods to use for creating the simulated data. Options are 'quantile_bins', 'equal_bins', 'kernel_density', and 'normal_approx'.

nbins

Vector of number of bins to use with bin based simulation methods.

label

Response category to use in the explanations. Current implementation only accepts 1 label.

n_features

Number of features to return in the explanations.

n_permutations

Number of permutations to use when simulating data for each explanation. Default is 5000.

feature_select

Feature selection method. Options are 'auto', 'none', 'forward_selection', 'highest_weights', 'lasso_path', and 'tree'.

dist_fun

Distance function to use when computing weights for the simulated data. Default is 'gower'. Otherwise, stats::dist() will be used.

kernel_width

Kernel width to use if dist_fun is not 'gower'.

gower_pow

Numeric vector of powers to use when computing the Gower distance. (Note: If gower_pow is a vector with more than one unique number, the simulated values will be reused for an observation in the test data to compare explanations across gower powers within the same set of other tuning parameters.)

all_fs

Indicates whether all feature selection methods should be applied for an implementation of LIME to see how the features selected varies within a LIME implemenation. Note that the LIME results returned will correspond to the method specified in the feature_selection option.

return_perms

Should the simulated dataset (permutations) be returned for all of the observations in the test datatset and LIME implementations? Default is FALSE.

parallel

Indicates whether to perform the application of LIME using parallel computation (with furrr) or without (with purrr). Default is FALSE. Setting parallel = TRUE may help with computation time with very large test datasets or many different sets of tuning parameters.

seed

Number to be used as a seed (if desired).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
# Prepare training and testing data
x_train = sine_data_train[c("x1", "x2", "x3")]
y_train = factor(sine_data_train$y)
x_test = sine_data_test[1:5, c("x1", "x2", "x3")]

# Fit a random forest model
rf <- randomForest::randomForest(x = x_train, y = y_train) 

# Run apply_lime
res <- apply_lime(train = x_train, 
                  test = x_test, 
                  model = rf,
                  label = "1",
                  n_features = 2,
                  sim_method = c('quantile_bins',
                                 'kernel_density'),
                  nbins = 2:3)

goodekat/limeaid documentation built on March 26, 2021, 10:45 p.m.