estimate_variable_importance: Estimate variable importance
In CedricMondy/ecodiag: Build Standardized Diagnostic Tool Models

Description Usage Arguments Details Value See Also

This function allows to estimate the importance of individual variables in a model unit of a diagnostic tool.

estimate_variable_importance(modelPath, methods = c("anova.test", "auc",
  "chi.squared", "gain.ratio", "information.gain", "kruskal.test",
  "ranger.impurity", "ranger.permutation"), nVarToPlot = 20,
  nIter = 10, nCores = 1L)

`modelPath`	the path of the RData file where the model is saved
`methods`	character vector. The metric(s) used to estimate variable importance. The available choices are: `anova.test`, `auc`, `chi.squared` (package FSelector), `gain.ratio` (FSelector), `information.gain` (FSelector), `kruskal.test`, `ranger.impurity` (ranger) and/or `ranger.permutation` (ranger)
`nVarToPlot`	numeric. The number of most important variables to graphically represent
`nIter`	integer. If `ranger.impurity` or `ranger.permutation` is used as importance metrics, the number of times the estimate is repeated.
`nCores`	integer.If `ranger.impurity` or `ranger.permutation` is used as importance metrics and `nIter` larger than 1, the number of CPUs used to perform the computations.

This functions estimates the variable importance of all the variables included in the investigated model using the importance metric(s) specified in the method argument. In this regard, the function is a wrapper around the function generateFilterValuesData from the mlr package with the possibility to run multiple iterations for the metrics ranger.impurity and ranger.permutation potentially in parallel (using nCores larger than 1).

The second step performed by this function corresponds to the production of a plot representing the importance of the most important variables. The selection of the metrics is performed by ranking the importance metric values and the number of variables to be represented is controlled by the argument nVarToPlot. If several importance metrics are used, the selection is made on the average rank of the variables over the different metrics.

a list with two elements: varImp: the table with the importance measure(s) for all variables and varImpPlot a ggplot object representing the importance of the most important variables.

generateFilterValuesData

CedricMondy/ecodiag documentation built on May 10, 2019, 3:14 a.m.