run_partial_dependency: Convenience function for calculating partial dependency and...
In breather/brightbox: Peek into any blackbox learner (including ensembles)

Description Usage Arguments Value Examples

Given a list of models, get their average prediction over a range of values for each feature in features_cols

run_partial_dependency(feature_dt, model_list,
  feature_cols = names(feature_dt), predict_fcn = predict,
  ensemble_colname = "ensemble", ensemble_fcn = median,
  ensemble_models = names(model_list), num_grid = 10, custom_range = NULL,
  plot_fcn = plot_partial_dependency, vimp_colname = "ensemble",
  plot = TRUE, facet = TRUE, ncol = NULL)

`feature_dt`	data.table containing features used in predictive model
`model_list`	named list of model objects. Each name will become a column containing predictions from that model.
`feature_cols`	character vector of column names in `feature_dt` on which to calculate variable importance. Defaults to all columns in `feature_dt`
`predict_fcn`	function that accepts a model as its first argument and `newdata` as one of its named arguments
`ensemble_colname`	character. Name of the column containing ensemble predictions
`ensemble_fcn`	function that combines a vector of predictions into a single ensemble. Default is `median`
`ensemble_models`	character vector of names from model_list. These models will be combined by ensemble_fcn to form the ensemble
`num_grid`	number of points to distribute along range of `feature_col` or `custom_range`
`custom_range`	should only be used if `feature_cols` is a 1-element vector Defines a custom range to calculate partial dependency over. This can be a 2-element numerical vector or a character vector, depending on the type of `feature_cols[1]`
`plot_fcn`	a function that accepts the output from `calculate_partial_dependency` and returns a ggplot object.
`vimp_colname`	name of model (taken from from `model_list` or `ensemble_colname`) for which to calculate variable importance
`plot`	TRUE/FALSE. Should the partial dependencies be plotted? Defaults to TRUE
`facet`	TRUE/FALSE. If `plot = TRUE`, should the graphs be combined into one plot? Defaults to TRUE
`ncol`	if `facet = TRUE`, number of columns in the facetted plot

Output is a data.table with one column for every model in model_list, an ensemble column, feature name and feature value columns, and the variable importance column

## Not run: 
dt <- data.table(a = 1:3, b = 2:4, c = c(8, 11, 14))
m <- lm(c ~ a + b - 1, dt)
gm <- glm(c ~ a + b - 1, data = dt)
run_partial_dependency(feature_dt = dt[, list(a, b)],
                       model_list = list(lm1 = m, gm1 = gm))

## End(Not run)