estimate_variable_importance: Estimate variable importance

Description Usage Arguments Details Value See Also

Description

This function allows to estimate the importance of individual variables in a model unit of a diagnostic tool.

Usage

1
2
3
4
estimate_variable_importance(modelPath, methods = c("anova.test", "auc",
  "chi.squared", "gain.ratio", "information.gain", "kruskal.test",
  "ranger.impurity", "ranger.permutation"), nVarToPlot = 20,
  nIter = 10, nCores = 1L)

Arguments

modelPath

the path of the RData file where the model is saved

methods

character vector. The metric(s) used to estimate variable importance. The available choices are: anova.test, auc, chi.squared (package FSelector), gain.ratio (FSelector), information.gain (FSelector), kruskal.test, ranger.impurity (ranger) and/or ranger.permutation (ranger)

nVarToPlot

numeric. The number of most important variables to graphically represent

nIter

integer. If ranger.impurity or ranger.permutation is used as importance metrics, the number of times the estimate is repeated.

nCores

integer.If ranger.impurity or ranger.permutation is used as importance metrics and nIter larger than 1, the number of CPUs used to perform the computations.

Details

This functions estimates the variable importance of all the variables included in the investigated model using the importance metric(s) specified in the method argument. In this regard, the function is a wrapper around the function generateFilterValuesData from the mlr package with the possibility to run multiple iterations for the metrics ranger.impurity and ranger.permutation potentially in parallel (using nCores larger than 1).

The second step performed by this function corresponds to the production of a plot representing the importance of the most important variables. The selection of the metrics is performed by ranking the importance metric values and the number of variables to be represented is controlled by the argument nVarToPlot. If several importance metrics are used, the selection is made on the average rank of the variables over the different metrics.

Value

a list with two elements: varImp: the table with the importance measure(s) for all variables and varImpPlot a ggplot object representing the importance of the most important variables.

See Also

generateFilterValuesData


CedricMondy/ecodiag documentation built on May 10, 2019, 3:14 a.m.