importance: Importance Extract variable importance measures produced by...
In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

View source: R/importance.R View source: R/Lrnr_randomForest.R

importance

R Documentation

Importance Extract variable importance measures produced by `randomForest` and order in decreasing order of importance.

Description

Function that takes a cross-validated fit (i.e., cross-validated learner that has already been trained on a task), which could be a cross-validated single learner or super learner, and generates a risk-based variable importance score for either each covariate or each group of covariates in the task. This function outputs a data.table, where each row corresponds to the risk difference or the risk ratio between the following two risks: the risk when a covariate (or group of covariates) is permuted or removed, and the original risk (i.e., when all covariates are included as they were in the observed data). A higher risk ratio/difference corresponds to a more important covariate/group. A plot can be generated from the returned data.table by calling companion function importance_plot.

Usage

importance(fit, eval_fun = NULL, fold_number = "validation",
  type = c("remove", "permute"), importance_metric = c("difference",
  "ratio"), covariate_groups = NULL)

importance(fit, eval_fun = NULL, fold_number = "validation",
  type = c("remove", "permute"), importance_metric = c("difference",
  "ratio"), covariate_groups = NULL)

Arguments

`fit`	A trained cross-validated (CV) learner (such as a CV stack or super learner), from which cross-validated predictions can be generated.
`eval_fun`	The evaluation function (risk or loss function) for evaluating the risk. Defaults vary based on the outcome type, matching defaults in `default_metalearner`. See `loss_functions` and `risk_functions` for options. Default is `NULL`.
`fold_number`	The fold number to use for obtaining the predictions from the fit. Either a positive integer for obtaining predictions from a specific fold's fit; `"full"` for obtaining predictions from a fit on all of the data, or `"validation"` (default) for obtaining cross-validated predictions, where the data used for training and prediction never overlaps across the folds. Note that if a positive integer or `"full"` is supplied here then there will be overlap between the data used for training and validation, so `fold_number ="validation"` is recommended.
`type`	Which method should be used to obscure the relationship between each covariate / covariate group and the outcome? When `type` is `"remove"` (default), each covariate / covariate group is removed one at a time from the task; the cross-validated learner is refit to this modified task; and finally, predictions are obtained from this refit. When `type` is `"permute"`, each covariate / covariate group is permuted (sampled without replacement) one at a time, and then predictions are obtained from this modified data.
`importance_metric`	Either `"ratio"` or `"difference"` (default). For each covariate / covariate group, `"ratio"` returns the risk of the permuted/removed covariate / covariate group divided by observed/original risk (i.e., the risk with all covariates as they existed in the sample) and `"difference"` returns the difference between the risk with the permuted/removed covariate / covariate group and the observed risk.
`covariate_groups`	Optional named list covariate groups which will invoke variable importance evaluation at the group-level, by removing/permuting all covariates in the same group together. If covariates in the task are not specified in the list of groups, then those covariates will be added as additional single-covariate groups.

Value

A data.table of variable importance for each covariate.

Examples

# define ML task
data(cpp_imputed)
covs <- c("apgar1", "apgar5", "parity", "gagebrth", "mage", "meducyrs")
task <- sl3_Task$new(cpp_imputed, covariates = covs, outcome = "haz")

# build relatively fast learner library (not recommended for real analysis)
lasso_lrnr <- Lrnr_glmnet$new()
glm_lrnr <- Lrnr_glm$new()
ranger_lrnr <- Lrnr_ranger$new()
lrnrs <- c(lasso_lrnr, glm_lrnr, ranger_lrnr)
names(lrnrs) <- c("lasso", "glm", "ranger")
lrnr_stack <- make_learner(Stack, lrnrs)

# instantiate SL with default metalearner
sl <- Lrnr_sl$new(lrnr_stack)
sl_fit <- sl$train(task)

importance_result <- importance(sl_fit)
importance_result

# importance with groups of covariates
groups <- list(
  scores = c("apgar1", "apgar5"),
  maternal = c("parity", "mage", "meducyrs")
)
importance_result_groups <- importance(sl_fit, covariate_groups = groups)
importance_result_groups

jeremyrcoyle/sl3 documentation built on Nov. 18, 2024, 4:21 p.m.

jeremyrcoyle/sl3 index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jeremyrcoyle/sl3
Pipelines for Machine Learning and Super Learning

importance: Importance Extract variable importance measures produced by...
In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

Importance Extract variable importance measures produced by `randomForest` and order in decreasing order of importance.

Description

Usage

Arguments

Value

Examples

Related to importance in jeremyrcoyle/sl3...

R Package Documentation

Browse R Packages

We want your feedback!

jeremyrcoyle/sl3 Pipelines for Machine Learning and Super Learning

importance: Importance Extract variable importance measures produced by... In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

Importance Extract variable importance measures produced by randomForest and order in decreasing order of importance.

Description

Usage

Arguments

Value

Examples

Related to importance in jeremyrcoyle/sl3...

R Package Documentation

Browse R Packages

We want your feedback!

jeremyrcoyle/sl3
Pipelines for Machine Learning and Super Learning

importance: Importance Extract variable importance measures produced by...
In jeremyrcoyle/sl3: Pipelines for Machine Learning and Super Learning

Importance Extract variable importance measures produced by `randomForest` and order in decreasing order of importance.