Description Usage Arguments Details Value Examples
A normalized feature importance value is calculated for each potential predictor with
respect to all of the methods
provided in the call. In addition, average importance
and average rank are calculated for each predictor across all methods. Right now only
classification is supported, so y
should be a binary numeric variable. Regression
methods (pearson/spearman correlations, anova) may be supported in the future.
1 2 |
df |
(data frame) data frame containing the response and all potential predictors |
y |
(character) binary response variable name (must be a variable within |
methods |
(character) vector of methods to use for importance calculations (see details) |
nboot |
(integer) number of bootstrap samples to use (or zero for no bootstrapping) |
nbins |
(integer) number of bins to use for chi-squared / information value calculations |
nplot |
(integer) number of variables to show on the final |
nabin |
(logical) whether to include an additional bin for missing values in chi2/iv |
control |
(list) parameters to pass to each modeling function to override/augment defaults (see details) |
Using the df
of predictors and y
response variable supplied, feature importance
scores will be calculated for each of the methods
supplied in the call. Supported methods
and the associated variable importance metrics are described below:
iv - bootstrap-averaged Information Value based on binned predictor values
chi2 - bootstrap-averaged Chi-Squared based on binned predictor values
rf - MeanDecreaseAccuracy variable importance metric from a single RandomForest model
gbm - RelativeInfluence variable importance metric from a single GBM model
be - bootstrap-averaged GCV reduction variable importance metric from MARS/Earth models
bl - bootstrap-averaged univariate AUC for the set of predictors selected by Lasso models
As the RF/GBM methods already include inherent bootstrapping in the tree ensembles each
of those models is run only once. Importance scores from the other methods are derived
via bootstrap averaging to reduce variance and increase the stability of importance metrics.
If you want to specify alternate parameters for each of the modeling methods, you can pass
them as lists within the main control
parameter. For each method you want to use, pass
named arguments as list items within a list where the outer name matches the method name (e.g. 'rf')
See the examples for how this works. Any method parameters passed in this way will either
override matching defaults or be added as additional parameters to the model. Be careful
with overriding the defaults as all combinations of parameters have not been fully tested!
a list containing the following elements:
varImp.df - a data frame containing average importance, average rank, and method-specific importance for all predictors
varImp.plot - a ggplot2 object showing average normalized importance across all methods for the top nplot
predictors
methods - the character vector of methods passed in the call
params - a list containing the additional parameters used for each model
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | library(caret)
data(GermanCredit, package = 'caret')
credit <- GermanCredit
credit$Class <- as.numeric(credit$Class == 'Good')
credit <- credit[,-nearZeroVar(credit)]
credit <- credit[,-findCorrelation(cor(select(credit, -Class)), cutoff = 0.8)]
res <- bootImp(credit, 'Class', nboot = 10, nbins = 10, nplot = 20)
res$varImp.df
res$varImp.plot
res$methods
res$params
controls <- list('rf' = list(ntree = 200, mtry = 5, nodesize = 10, importance = TRUE),
'gbm' = list(n.trees = 100, shrinkage = 0.25, cv.folds = 10)
)
res <- bootImp(credit, 'Class', control = controls)
res$varImp.df
res$varImp.plot
res$methods
res$params
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.