calculate_pd_vimp_normed: Calculate normed variable importance based on partial...

Description Usage Arguments Examples

Description

Normed variable importance is calculated as the difference between the min and max, divided by the median (or other summary function) of the standard deviations of the individual model predictions at any value cutpoint. Most effective when training many models on different subsamples of training data.

Usage

1
2
calculate_pd_vimp_normed(pd, ensemble_colname = "ensemble", epsilon = 1e-07,
  summary_fcn = median)

Arguments

pd

output from calculate_partial_dependency

ensemble_colname

name of ensemble column specified in calculate_partial_dependency.

epsilon

small value to add to the minimum standard deviation before dividing to prevent Inf return value. Defaults to 1e-07.

summary_fcn

function to summarize vector of model standard deviations at each value cutpoint. Function must return a vector of length 1. Defaults to median.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Not run: 
# Example output from calculate_partial_dependency
pd <- data.table(feature = rep("a", 9),
                 feature_val = rep(c(1, 3.5, 6), 3),
                 model = rep(c("model1",
                               "model2",
                               "ensemble"),
                             each = 3),
                 prediction = c(c(-2.5, 0, 2.5),
                                c(0, 0, 0),
                                c(-2.5, -0.75, 0)))
calculate_pd_vimp_normed(pd, ensemble_colname = "ensemble")

## End(Not run)

breather/brightbox documentation built on May 13, 2019, 5:04 a.m.