varimp: Stochastic intervention based variable importance in...

View source: R/base.R

varimpR Documentation

Stochastic intervention based variable importance in cross-sectional data

Description

Estimate variable importance based on a stochastic intervention to increase each predictor by a small amount (continuous) or increase the probability of a predictor by the same small amount (binary). The underlying methodology is based on papers by Ivan Diaz and Mark van der Laan. This function supports doubly robust estimation (Targeted maximum likelihood and augmented inverse probability weighting) as well as modern techniques to reduce bias and speed up convergence (in sample size) of these methods (double cross-fitting and repeated double cross-fitting). The underlying statistical models draw on sl3, an R package with a unified framework for machine learning and ensemble machine learning based around the super learner algorithm (stacking).

Usage

varimp(
  X,
  W = NULL,
  Y,
  V = NULL,
  delta = 0.1,
  Y_learners = NULL,
  Xdensity_learners = NULL,
  Xbinary_learners = NULL,
  verbose = TRUE,
  estimator = "AIPW",
  bounded = FALSE,
  updatetype = "weighted",
  estimand = "diff",
  family = gaussian(),
  xfitfolds = 3,
  foldrepeats = 10,
  B = NULL,
  showProgress = TRUE,
  scale_continuous = TRUE,
  ...
)

Arguments

X

data frame of variables for which variable importance will be estimated

W

data frame of covariates (e.g. potential confounders) for which variable importance will not be estimated

Y

outcome

V

(default NULL) a data frame of other variables that contain weights or offsets, if used ("weights" and "offset" are both passed to other functions via extra arguments represented by ...: see example below)

delta

change in each column of X corresponding to

Y_learners

list of sl3 learners used to predict the outcome, conditional on all predictors in X

Xdensity_learners

list of sl3 learners used to estimate the density of continuous predictors, conditional on all other predictors in X

Xbinary_learners

list of sl3 learners used to estimate the probability mass of continuous predictors, conditional on all other predictors in X

verbose

(logical) print extra information

estimator

(character) "AIPW" (default), "TMLE", "GCOMP", "IPW", "TMLEX" (cross fit TMLE), "AIPWX" (cross fit AIPW), "GCOMPX" (cross fit GCOMP), "IPWX" (cross fit IPW)

bounded

(logical) not yet implemented

updatetype

(character) (used for estimator = "TMLE" only) "weighted" or "predictor." If "weighted" then uses weighting by clever covariate in update step of TMLE, otherwise fits a generalized linear model with no intercept and clever covariate as a sole predictor

estimand

(character) "diff" (default, estimate mean difference comparing Y under intervention with observed Y), "mean" (estimate mean Y under intervention)

family

(character or glm families binomial or gaussian, default = gaussian()) Outcome type: can be gaussian(), binomial(), "gaussian" or "binomial"; will be guessed if left NULL

xfitfolds

(odd integer, default=3) (used for estimator = "TMLEX" only) number of cross-fit folds (must be odd number - last fold is used for validation while the rest of the data are split in two for fitting treatment or outcome models)

foldrepeats

(integer, default=10) (used for estimator = "TMLEX" only) number of times to repeat cross-fitting (higher numbers = more stable)

B

(NULL or integer) Number of bootstrap iterations (NULL = asymptotic variance only)

showProgress

show progress of bootstrapping (only relevant if B is not NULL)

scale_continuous

(logical, default: TRUE) scale all continuous variables in X to have a standard deviation of 0.5

...

passed to sl3::make_sl3_Task (e.g. weights)

Value

vibr_fit object

Examples

## Not run: 
data(metals, package="qgcomp")
XYlist = list(X=metals[,1:23], Y=metals$y)
Y_learners = .default_continuous_learners()
Xbinary_learners = list(Lrnr_stepwise$new(name="SW"))
Xdensity_learners = .default_density_learners(n_bins=c(10))
set.seed(1231)
vi <- varimp(X=XYlist$X,Y=XYlist$Y, delta=0.1, Y_learners = Y_learners,
       Xdensity_learners=Xdensity_learners[1:2], Xbinary_learners=Xbinary_learners,
       estimator="TMLE")
vi
set.seed(1231)
V = data.frame(wt=runif(nrow(metals)))
viw <- varimp(X=XYlist$X,Y=XYlist$Y, V=V, delta=0.1, Y_learners = Y_learners,
       Xdensity_learners=Xdensity_learners[1:2], Xbinary_learners=Xbinary_learners,
       estimator="TMLE", weights="wt")
viw

## End(Not run)

alexpkeil1/vibr documentation built on Sept. 13, 2023, 3:20 a.m.