var_stability: Variable stability

View source: R/varImp.R

var_stabilityR Documentation

Variable stability

Description

Uses variable importance across models trained and tested across outer CV folds to assess stability of variable importance. For glmnet, variable importance is measured as the absolute model coefficients, optionally scaled as a percentage. The frequency with which each variable is selected in outer folds as well as the final model is also returned which is helpful for sparse models or with filters to determine how often variables end up in the model in each fold. For glmnet, the direction of effect is taken directly from the sign of model coefficients. For caret models, direction of effect is not readily available, so as a substitute, the directionality of each predictor is determined by the function var_direction() using the sign of a t-test for binary classification or the sign of regression coefficient for continuous outcomes (not available for multiclass caret models). To better understand direction of effect of each predictor within the final model, we recommend using SHAP values - see the vignette "Explaining nestedcv models with Shapley values". See pred_train() for an example.

Usage

var_stability(x, ...)

## S3 method for class 'nestcv.glmnet'
var_stability(x, percent = TRUE, level = 1, sort = TRUE, ...)

## S3 method for class 'nestcv.train'
var_stability(x, sort = TRUE, ...)

Arguments

x

a nestcv.glmnet or nestcv.train fitted object

...

Optional arguments for compatibility

percent

Logical for nestcv.glmnet objects only, whether to scale coefficients to percentage of the largest coefficient in each model

level

For multinomial nestcv.glmnet models only, either an integer specifying which level of outcome is being examined, or the level can be specified as a character value

sort

Logical whether to sort variables by mean importance

Details

Note that for caret models caret::varImp() may require the model package to be fully loaded in order to function. During the fitting process caret often only loads the package by namespace.

Value

Dataframe containing mean, sd, sem of variable importance and frequency by which each variable is selected in outer folds.

See Also

cv_coef() cv_varImp() pred_train()


nestedcv documentation built on Oct. 26, 2023, 5:08 p.m.