importance: Importance Extract variable importance measures produced by...

Description Usage Arguments Value Examples

View source: R/importance.R View source: R/Lrnr_randomForest.R

Description

Function that takes a cross-validated fit (i.e., cross-validated learner that has already been trained on a task), which could be a cross-validated single learner or super learner, and generates a risk-based variable importance score for either each covariate or each group of covariates in the task. This function outputs a data.table, where each row corresponds to the risk difference or the risk ratio between the following two risks: the risk when a covariate (or group of covariates) is permuted or removed, and the original risk (i.e., when all covariates are included as they were in the observed data). A higher risk ratio/difference corresponds to a more important covariate/group. A plot can be generated from the returned data.table by calling companion function importance_plot.

Usage

1
2
3
4
5
6
7
importance(fit, eval_fun = NULL, fold_number = "validation",
  type = c("remove", "permute"), importance_metric = c("difference",
  "ratio"), covariate_groups = NULL)

importance(fit, eval_fun = NULL, fold_number = "validation",
  type = c("remove", "permute"), importance_metric = c("difference",
  "ratio"), covariate_groups = NULL)

Arguments

fit

A trained cross-validated (CV) learner (such as a CV stack or super learner), from which cross-validated predictions can be generated.

eval_fun

The evaluation function (risk or loss function) for evaluating the risk. Defaults vary based on the outcome type, matching defaults in default_metalearner. See loss_functions and risk_functions for options. Default is NULL.

fold_number

The fold number to use for obtaining the predictions from the fit. Either a positive integer for obtaining predictions from a specific fold's fit; "full" for obtaining predictions from a fit on all of the data, or "validation" (default) for obtaining cross-validated predictions, where the data used for training and prediction never overlaps across the folds. Note that if a positive integer or "full" is supplied here then there will be overlap between the data used for training and validation, so fold_number ="validation" is recommended.

type

Which method should be used to obscure the relationship between each covariate / covariate group and the outcome? When type is "remove" (default), each covariate / covariate group is removed one at a time from the task; the cross-validated learner is refit to this modified task; and finally, predictions are obtained from this refit. When type is "permute", each covariate / covariate group is permuted (sampled without replacement) one at a time, and then predictions are obtained from this modified data.

importance_metric

Either "ratio" or "difference" (default). For each covariate / covariate group, "ratio" returns the risk of the permuted/removed covariate / covariate group divided by observed/original risk (i.e., the risk with all covariates as they existed in the sample) and "difference" returns the difference between the risk with the permuted/removed covariate / covariate group and the observed risk.

covariate_groups

Optional named list covariate groups which will invoke variable importance evaluation at the group-level, by removing/permuting all covariates in the same group together. If covariates in the task are not specified in the list of groups, then those covariates will be added as additional single-covariate groups.

Value

A data.table of variable importance for each covariate.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# define ML task
data(cpp_imputed)
covs <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs")
task <- sl3_Task$new(cpp_imputed, covariates = covs, outcome = "haz")

# build relatively fast learner library (not recommended for real analysis)
lasso_lrnr <- Lrnr_glmnet$new()
glm_lrnr <- Lrnr_glm$new()
ranger_lrnr <- Lrnr_ranger$new()
lrnrs <- c(lasso_lrnr, glm_lrnr, ranger_lrnr)
names(lrnrs) <- c("lasso", "glm", "ranger")
lrnr_stack <- make_learner(Stack, lrnrs)

# instantiate SL with default metalearner
sl <- Lrnr_sl$new(lrnr_stack)
sl_fit <- sl$train(task)

importance_result <- importance(sl_fit)
importance_result

# importance with groups of covariates
groups <- list(
  scores = c("apgar1", "apgar5"),
  maternal = c("parity", "mage", "meducyrs")
)
importance_result_groups <- importance(sl_fit, covariate_groups = groups)
importance_result_groups

jeremyrcoyle/sl3 documentation built on Feb. 3, 2022, 9:12 a.m.