eRic: Eric's R functions developed while a summer analytics intern at Enova

Description Usage Arguments Details Value Examples

A normalized feature importance value is calculated for each potential predictor with respect to all of the methods provided in the call. In addition, average importance and average rank are calculated for each predictor across all methods. Right now only classification is supported, so y should be a binary numeric variable. Regression methods (pearson/spearman correlations, anova) may be supported in the future.

1 2	bootImp(df, y, methods = c("iv", "chi2", "rf", "gbm", "be", "bl"), nboot = 10, nbins = 10, nplot = 25, nabin = FALSE, control = list())

`df`	(data frame) data frame containing the response and all potential predictors
`y`	(character) binary response variable name (must be a variable within `df`)
`methods`	(character) vector of methods to use for importance calculations (see details)
`nboot`	(integer) number of bootstrap samples to use (or zero for no bootstrapping)
`nbins`	(integer) number of bins to use for chi-squared / information value calculations
`nplot`	(integer) number of variables to show on the final `varImp.plot`
`nabin`	(logical) whether to include an additional bin for missing values in chi2/iv
`control`	(list) parameters to pass to each modeling function to override/augment defaults (see details)

Using the df of predictors and y response variable supplied, feature importance scores will be calculated for each of the methods supplied in the call. Supported methods and the associated variable importance metrics are described below:

iv - bootstrap-averaged Information Value based on binned predictor values
chi2 - bootstrap-averaged Chi-Squared based on binned predictor values
rf - MeanDecreaseAccuracy variable importance metric from a single RandomForest model
gbm - RelativeInfluence variable importance metric from a single GBM model
be - bootstrap-averaged GCV reduction variable importance metric from MARS/Earth models
bl - bootstrap-averaged univariate AUC for the set of predictors selected by Lasso models

As the RF/GBM methods already include inherent bootstrapping in the tree ensembles each of those models is run only once. Importance scores from the other methods are derived via bootstrap averaging to reduce variance and increase the stability of importance metrics. If you want to specify alternate parameters for each of the modeling methods, you can pass them as lists within the main control parameter. For each method you want to use, pass named arguments as list items within a list where the outer name matches the method name (e.g. 'rf') See the examples for how this works. Any method parameters passed in this way will either override matching defaults or be added as additional parameters to the model. Be careful with overriding the defaults as all combinations of parameters have not been fully tested!

a list containing the following elements:

varImp.df - a data frame containing average importance, average rank, and method-specific importance for all predictors
varImp.plot - a ggplot2 object showing average normalized importance across all methods for the top nplot predictors
methods - the character vector of methods passed in the call
params - a list containing the additional parameters used for each model

library(caret)
data(GermanCredit, package = 'caret')

credit <- GermanCredit
credit$Class <- as.numeric(credit$Class == 'Good')
credit <- credit[,-nearZeroVar(credit)]
credit <- credit[,-findCorrelation(cor(select(credit, -Class)), cutoff = 0.8)]

res <- bootImp(credit, 'Class', nboot = 10, nbins = 10, nplot = 20)
res$varImp.df
res$varImp.plot
res$methods
res$params

controls <- list('rf' = list(ntree = 200, mtry = 5, nodesize = 10, importance = TRUE),
                'gbm' = list(n.trees = 100, shrinkage = 0.25, cv.folds = 10)
                )
res <- bootImp(credit, 'Class', control = controls)
res$varImp.df
res$varImp.plot
res$methods
res$params

etlundquist/eRic documentation built on May 16, 2019, 9:07 a.m.

etlundquist/eRic index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

etlundquist/eRic
Eric's R functions developed while a summer analytics intern at Enova

bootImp: Bootstrapped Feature Importance via Filter/Model-Based...
In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova

Description

Usage

Arguments

Details

Value

Examples

Related to bootImp in etlundquist/eRic...

R Package Documentation

Browse R Packages

We want your feedback!

etlundquist/eRic Eric's R functions developed while a summer analytics intern at Enova

bootImp: Bootstrapped Feature Importance via Filter/Model-Based... In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova

Description

Usage

Arguments

Details

Value

Examples

Related to bootImp in etlundquist/eRic...

R Package Documentation

Browse R Packages

We want your feedback!

etlundquist/eRic
Eric's R functions developed while a summer analytics intern at Enova

bootImp: Bootstrapped Feature Importance via Filter/Model-Based...
In etlundquist/eRic: Eric's R functions developed while a summer analytics intern at Enova