vim: Variable Importance Measures (VIMs)

View source: R/vim.R

vimR Documentation

Variable Importance Measures (VIMs)

Description

Calculate variable importance measures (VIMs) based on different approaches.

Usage

vim(
  model,
  scoring_rule = "auc",
  vim_type = "logic",
  adjust = TRUE,
  interaction_order = 3,
  nodesize = NULL,
  alpha = 0.05,
  X_oob = NULL,
  y_oob = NULL,
  Z_oob = NULL,
  leaves = "4pl",
  ...
)

Arguments

model

The fitted logicDT or logic.bagged model

scoring_rule

The scoring rule for assessing the model performance. As in logicDT, "auc", "nce", "deviance" and "brier" are possible for binary outcomes. For regression, the mean squared error is used.

vim_type

The type of VIM to be calculated. This can either be "logic", "remove" or "permutation". See below for details.

adjust

Shall adjusted interaction VIMs be additionally (to the VIMs of identified terms) computed? See below for details.

interaction_order

If adjust = TRUE, up to which interaction order shall adjusted interaction VIMs be computed?

nodesize

If adjust = TRUE, how many observations need to be discriminated by an interaction in order to being considered? Similar to conjsize in logicDT and nodesize in tree.control.

alpha

If adjust = TRUE, a further adjustment can be performed trying to identify the concrete conjunctions responsible for the interaction of the considered binary predictors. alpha specifies the significance level for statistical tests testing the alternative of a difference in the response for specific conjunctions. alpha = 0 leads to no further adjustment. See below for details.

X_oob

The predictor data which should be used for calculating the VIMs. Preferably some type of validation data independent of the training data.

y_oob

The outcome data for computing the VIMs. Preferably some type of validation data independent of the training data.

Z_oob

The optional covariable data for computing the VIMs. Preferably some type of validation data independent of the training data.

leaves

The prediction mode if 4pL models were fitted in the leaves. As in predict.logicDT, "4pl" and "constant" are the possible settings.

...

Parameters passed to the different VIM type functions. For vim_type = "logic", the argument average can be specified as "before" or "after". For vim_type = "permutation", n.perm can be set to the number of random permutations. See below for details. For vim_type = "remove", empty.model can be specified as either "none" ignoring empty models with all predictive terms removed or "mean" using the response mean as prediction in the case of an empty model.

Details

Three different VIM methods are implemented:

  • Permutation VIMs: Random permutations of the respective identified logic terms

  • Removal VIMs: Removing single logic terms

  • Logic VIMs: Prediction with both possible outcomes of a logic term

Details on the calculation of these VIMs are given below.

By variable importance, importance of identified logic terms is meant. These terms can also be single predictors but also conjunctions in the spirit of this software package.

Value

A data frame with two columns:

var

Short descriptions of the terms for which the importance was measured. For example -X1^X2 for X_1^c \land X_2.

vim

The actual calculated VIM values.

The rows of such a data frame are sorted decreasingly by the VIM values.

Permutation VIMs

Permutation VIMs are computed by comparing the the model's performance using the original data and data with random permutations of single terms. This approach was originally proposed by Breiman & Cutler (2003).

Removal VIMs

Removal VIMs are constructed removing specific logic term from the set of predictors, refitting the decision tree and comparing the performance to the original model. Thus, this approach requires that at least two terms were found by the algorithm. Therefore, no VIM will be calculated if empty.model = "none" was specified. Alternatively, empty.model = "mean" can be set to use the constant mean response model for approximating the empty model.

Logic VIMs

Logic VIMs use the fact that Boolean conjunctions are Boolean variables themselves and therefore are equal to 0 or 1. To compute the VIM for a specific term, predictions are performed once for this term fixed to 0 and once for this term fixed to 1. Then, the arithmetic mean of these two (risk or regression) predictions is is used for calculating the performance. This performance is then compared to the original one as in the other VIM approaches (average = "before"). Alternatively, predictions for each fixed 0-1 scenario of the considered term can be performed leading to individual performances which then are averaged and compared to the original performance (average = "after").

Validation

Validation data sets which were not used in the fitting of the model are prefered preventing an overfitting of the VIMs themselves. These should be specified by the _oob arguments, if neither bagging nor inner validation was used for fitting the model.

Bagging

For the bagging version, out of bag (OOB) data are naturally used for the calculation of VIMs.

VIM Adjustment for Interactions

Since decision trees can naturally include interactions between single predictors (especially when strong marginal effects are present as well), logicDT models might, e.g., include the single input variables X_1 and X_2 but not their interaction X_1 \land X_2 although an interaction effect is present. We, therefore, developed and implemented an adjustment approach for calculating VIMs for such unidentified interactions nonetheless. For predictors X_{i_1}, …, X_{i_k} =: Z, this interaction importance is given by

\mathrm{VIM}(X_{i_1} \land … \land X_{i_k}) = \mathrm{VIM}(X_{i_1}, …, X_{i_k} \mid X \setminus Z) - ∑_{\lbrace j_1, …, j_l \rbrace {\subset \atop \neq} \lbrace i_1, …, i_k \rbrace} \mathrm{VIM}(X_{j_1} \land … \land X_{j_l} \mid X \setminus Z)

and can basically be applied to all black-box models. By \mathrm{VIM}(A \mid X \setminus Z), the VIM of A considering the predictor set excluding the variables in Z is meant, i.e., the improvement of additionally considering A while regarding only the predictors in X \setminus Z. The proposed interaction VIM can be recursively calculated through

\mathrm{VIM}(X_{i_1} \land X_{i_2}) = \mathrm{VIM}(X_{i_1}, X_{i_2} \mid X \setminus Z) - \mathrm{VIM}(X_{i_1} \mid X \setminus Z) - \mathrm{VIM}(X_{i_2} \mid X \setminus Z)

for Z = X_{i_1}, X_{i_2}. This leads to the relationship

\mathrm{VIM}(X_{i_1} \land … \land X_{i_k}) = ∑_{\lbrace j_1, …, j_l \rbrace \subseteq \lbrace i_1, …, i_k \rbrace} (-1)^{k-l} \cdot \mathrm{VIM}(X_{j_1}, …, X_{j_l} \mid X \setminus Z).

Identification of Concrete Conjunctions

The aforementioned VIM adjustment approach only captures the importance of a general definition of interactions, i.e., it just considers the question whether some variables do interact in any way. Since logicDT is aimed at identifying specific conjunctions (and also assigns them VIMs if they were identified by logicDT), a further adjustment approach is implemented which tries to identify the specific conjunction leading to an interaction effect. The idea of this method is to consider the response for each possible scenario of the interacting variables, e.g., for X_1 \land (X_2^c \land X_3) where the second term X_2^c \land X_3 was identified by logicDT and, thus, two interacting terms are regarded, the 2^2 = 4 possible scenarios \lbrace (i, j) \mid i, j \in \lbrace 0, 1 \rbrace \rbrace are considered. For each setting, the corresponding response is compared with outcome values of the complementary set. For continuous outcomes, a two sample t-test (with Welch correction for potentially unequal variances) is performed comparing the means between these two groups. For binary outcomes, Fisher's exact test is performed testing different underlying case probabilities. If at least one test rejects the null hypothesis of equal outcomes (without adjusting for multiple testing), the combination with the lowest p-value is chosen as the explanatory term for the interaction effect. For example, if the most significant deviation results from X_1 = 0 and (X_2^c \land X_3) = 1 from the example above, the term X_1^c \land (X_2^c \land X_3) is chosen.

References


logicDT documentation built on Jan. 14, 2023, 5:06 p.m.