run_partial_dependency: Convenience function for calculating partial dependency and...

Description Usage Arguments Value Examples

Description

Given a list of models, get their average prediction over a range of values for each feature in features_cols

Usage

1
2
3
4
5
6
run_partial_dependency(feature_dt, model_list,
  feature_cols = names(feature_dt), predict_fcn = predict,
  ensemble_colname = "ensemble", ensemble_fcn = median,
  ensemble_models = names(model_list), num_grid = 10, custom_range = NULL,
  plot_fcn = plot_partial_dependency, vimp_colname = "ensemble",
  plot = TRUE, facet = TRUE, ncol = NULL)

Arguments

feature_dt

data.table containing features used in predictive model

model_list

named list of model objects. Each name will become a column containing predictions from that model.

feature_cols

character vector of column names in feature_dt on which to calculate variable importance. Defaults to all columns in feature_dt

predict_fcn

function that accepts a model as its first argument and newdata as one of its named arguments

ensemble_colname

character. Name of the column containing ensemble predictions

ensemble_fcn

function that combines a vector of predictions into a single ensemble. Default is median

ensemble_models

character vector of names from model_list. These models will be combined by ensemble_fcn to form the ensemble

num_grid

number of points to distribute along range of feature_col or custom_range

custom_range

should only be used if feature_cols is a 1-element vector Defines a custom range to calculate partial dependency over. This can be a 2-element numerical vector or a character vector, depending on the type of feature_cols[1]

plot_fcn

a function that accepts the output from calculate_partial_dependency and returns a ggplot object.

vimp_colname

name of model (taken from from model_list or ensemble_colname) for which to calculate variable importance

plot

TRUE/FALSE. Should the partial dependencies be plotted? Defaults to TRUE

facet

TRUE/FALSE. If plot = TRUE, should the graphs be combined into one plot? Defaults to TRUE

ncol

if facet = TRUE, number of columns in the facetted plot

Value

Output is a data.table with one column for every model in model_list, an ensemble column, feature name and feature value columns, and the variable importance column

Examples

1
2
3
4
5
6
7
8
## Not run: 
dt <- data.table(a = 1:3, b = 2:4, c = c(8, 11, 14))
m <- lm(c ~ a + b - 1, dt)
gm <- glm(c ~ a + b - 1, data = dt)
run_partial_dependency(feature_dt = dt[, list(a, b)],
                       model_list = list(lm1 = m, gm1 = gm))

## End(Not run)

breather/brightbox documentation built on May 13, 2019, 5:04 a.m.