bootImp: Bootstrapped Feature Importance via Filter/Model-Based...

Description Usage Arguments Details Value Examples

Description

A normalized feature importance value is calculated for each potential predictor with respect to all of the methods provided in the call. In addition, average importance and average rank are calculated for each predictor across all methods. Right now only classification is supported, so y should be a binary numeric variable. Regression methods (pearson/spearman correlations, anova) may be supported in the future.

Usage

1
2
bootImp(df, y, methods = c("iv", "chi2", "rf", "gbm", "be", "bl"),
  nboot = 10, nbins = 10, nplot = 25, nabin = FALSE, control = list())

Arguments

df

(data frame) data frame containing the response and all potential predictors

y

(character) binary response variable name (must be a variable within df)

methods

(character) vector of methods to use for importance calculations (see details)

nboot

(integer) number of bootstrap samples to use (or zero for no bootstrapping)

nbins

(integer) number of bins to use for chi-squared / information value calculations

nplot

(integer) number of variables to show on the final varImp.plot

nabin

(logical) whether to include an additional bin for missing values in chi2/iv

control

(list) parameters to pass to each modeling function to override/augment defaults (see details)

Details

Using the df of predictors and y response variable supplied, feature importance scores will be calculated for each of the methods supplied in the call. Supported methods and the associated variable importance metrics are described below:

As the RF/GBM methods already include inherent bootstrapping in the tree ensembles each of those models is run only once. Importance scores from the other methods are derived via bootstrap averaging to reduce variance and increase the stability of importance metrics. If you want to specify alternate parameters for each of the modeling methods, you can pass them as lists within the main control parameter. For each method you want to use, pass named arguments as list items within a list where the outer name matches the method name (e.g. 'rf') See the examples for how this works. Any method parameters passed in this way will either override matching defaults or be added as additional parameters to the model. Be careful with overriding the defaults as all combinations of parameters have not been fully tested!

Value

a list containing the following elements:

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
library(caret)
data(GermanCredit, package = 'caret')

credit <- GermanCredit
credit$Class <- as.numeric(credit$Class == 'Good')
credit <- credit[,-nearZeroVar(credit)]
credit <- credit[,-findCorrelation(cor(select(credit, -Class)), cutoff = 0.8)]

res <- bootImp(credit, 'Class', nboot = 10, nbins = 10, nplot = 20)
res$varImp.df
res$varImp.plot
res$methods
res$params

controls <- list('rf' = list(ntree = 200, mtry = 5, nodesize = 10, importance = TRUE),
                'gbm' = list(n.trees = 100, shrinkage = 0.25, cv.folds = 10)
                )
res <- bootImp(credit, 'Class', control = controls)
res$varImp.df
res$varImp.plot
res$methods
res$params

etlundquist/eRic documentation built on May 16, 2019, 9:07 a.m.