Contributors: Yongqi Zhong, Ashley Naimi, Gabriel Conzuelo, Edward Kennedy
knitr::opts_chunk$set( collapse = TRUE, fig.width = 6, fig.height = 4, fig.path = "man/figures/", comment = "#>" )
Augmented inverse probability weighting (AIPW) is a doubly robust estimator for causal inference. The AIPW package is designed for estimating the average treatment effect of a binary exposure on risk difference (RD), risk ratio (RR) and odds ratio (OR) scales with user-defined stacked machine learning algorithms (SuperLearner or sl3). Users need to examine causal assumptions (e.g., consistency) before using this package.
If you find this package is helpful, please consider to cite:
@article{zhong_aipw_2021,
author = {Zhong, Yongqi and Kennedy, Edward H and Bodnar, Lisa M and Naimi, Ashley I},
title = {AIPW: An R Package for Augmented Inverse Probability Weighted Estimation of Average Causal Effects},
journal = {American Journal of Epidemiology},
year = {2021},
month = {07},
issn = {0002-9262},
doi = {10.1093/aje/kwab207},
url = {https://doi.org/10.1093/aje/kwab207},
}
The major new feature introduced is the Repeated class, which allows for repeated cross-fitting procedures to mitigate randomness due to data splits in machine learning-based estimation as suggested by Chernozhukov et al. (2018). This feature:
- Enables running the cross-fitting procedure multiple times to produce more stable estimates
- Provides methods to summarize results using median-based approaches
- Supports parallelization with future.apply
- Includes visualization of estimate distributions across repetitions
- See the Repeated Cross-fitting vignette for more details
remotes::install_github("yqzhong7/AIPW@aje_version")install.packages("AIPW")
install.packages("remotes") remotes::install_github("yqzhong7/AIPW")
* CRAN version only supports SuperLearner and tmle. New GitHub versions (after v0.6.3.1) no longer support sl3 and tmle3. If you are still interested in using the version with sl3 and tmle3 support, please install remotes::install_github("yqzhong7/AIPW@aje_version") Please install the Github version (master branch) if you choose to use sl3 and tmle3.
set.seed(888) data("eager_sim_obs") outcome <- eager_sim_obs$sim_Y exposure <- eager_sim_obs$sim_A #covariates for both outcome model (Q) and exposure model (g) covariates <- as.matrix(eager_sim_obs[-1:-2]) # covariates <- c(rbinom(N,1,0.4)) #a vector of a single covariate is also supported
AIPW class: method chaining from R6class)library(AIPW) library(SuperLearner) library(ggplot2) AIPW_SL <- AIPW$new(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose=FALSE)$ fit()$ #Default truncation summary(g.bound = 0.025)$ plot.p_score()$ plot.ip_weights()
To see the results, set verbose = TRUE(default) or:
print(AIPW_SL$result, digits = 2)
To obtain average treatment effect among the treated/controls (ATT/ATC), statified_fit() must be used:
AIPW_SL_att <- AIPW$new(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose=T) suppressWarnings({ AIPW_SL_att$stratified_fit()$summary() })
You can also use the aipw_wrapper() to wrap new(), fit() and summary() together (also support method chaining):
AIPW_SL <- aipw_wrapper(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose=TRUE, stratified_fit=F)$plot.p_score()$plot.ip_weights()
The Repeated class allows for repeated cross-fitting procedures to mitigate randomness due to data splits. This approach is recommended in machine learning-based estimation as suggested by Chernozhukov et al. (2018).
library(SuperLearner) library(ggplot2) # First create a regular AIPW object aipw_obj <- AIPW$new(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose = FALSE) # Create a repeated fitting object from the AIPW object repeated_aipw <- Repeated$new(aipw_obj) # Perform repeated fitting 20 times repeated_aipw$repfit(num_reps = 20, stratified = FALSE) # Summarize results using median-based methods repeated_aipw$summary_median() # You can also visualize the distribution of estimates across repetitions estimates_df <- repeated_aipw$repeated_estimates ggplot(estimates_df, aes(x = Estimate, fill = Estimand)) + geom_density(alpha = 0.5) + theme_minimal() + labs(title = "Distribution of Estimates Across Repeated Fittings", subtitle = "Based on 20 repetitions", x = "Estimate Value", y = "Density")
Setting stratified = TRUE in the repfit() function will use the stratified fitting procedure for each repetition:
# Using stratified fitting repeated_aipw_strat <- Repeated$new(aipw_obj) repeated_aipw_strat$repfit(num_reps = 20, stratified = TRUE) repeated_aipw_strat$summary_median()
Note that the Repeated class also supports parallelization with future.apply as described below.
future.apply and progress bar with progressrIn default setting, the AIPW$fit() method will be run sequentially. The current version of AIPW package supports parallel processing implemented by future.apply package under the future framework. Simply use future::plan() to enable parallelization and set.seed() to take care of the random number generation (RNG) problem:
###Additional steps for parallel processing### # install.packages("future.apply") library(future.apply) future::plan(multiprocess, workers=2, gc=T) set.seed(888) ###Same procedure for AIPW as described above### AIPW_SL <- AIPW$new(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose=TRUE)$fit()$summary()
Progress bar that supports parallel processing is available in the AIPW$fit() method through the API from progressr package:
library(progressr) #define the type of progress bar handlers("progress") #reporting through progressr::with_progress() which is embedded in the AIPW$fit() method with_progress({ AIPW_SL <- AIPW$new(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose=FALSE)$fit()$summary() }) #also available for the wrapper with_progress({ AIPW_SL <- aipw_wrapper(Y = outcome, A = exposure, W = covariates, Q.SL.library = c("SL.mean","SL.glm"), g.SL.library = c("SL.mean","SL.glm"), k_split = 3, verbose=FALSE) })
tmle/tmle3 fitted object as input (AIPW_tmle class)AIPW_tmle class is designed for using tmle/tmle3 fitted object as input
tmlerequire(tmle) require(SuperLearner) tmle_fit <- tmle(Y = as.vector(outcome), A = as.vector(exposure),W = covariates, Q.SL.library=c("SL.mean","SL.glm"), g.SL.library=c("SL.mean","SL.glm"), family="binomial") tmle_fit #extract fitted tmle object to AIPW AIPW_tmle$ new(A=exposure,Y=outcome,tmle_fit = tmle_fit,verbose = TRUE)$ summary(g.bound=0.025)
tmle3New GitHub versions (after v0.6.3.1) no longer support sl3 and tmle3. If you are still interested in using the version with sl3 and tmle3 support, please install `remotes::install_github("yqzhong7/AIPW@aje_version")
remotes::install_github("yqzhong7/AIPW@aje_version") library(sl3) library(tmle3) node_list <- list(A = "sim_A",Y = "sim_Y",W = colnames(eager_sim_obs)[-1:-2]) or_spec <- tmle_OR(baseline_level = "0",contrast_level = "1") tmle_task <- or_spec$make_tmle_task(eager_sim_obs,node_list) lrnr_glm <- make_learner(Lrnr_glm) lrnr_mean <- make_learner(Lrnr_mean) sl <- Lrnr_sl$new(learners = list(lrnr_glm,lrnr_mean)) learner_list <- list(A = sl, Y = sl) tmle3_fit <- tmle3(or_spec, data=eager_sim_obs, node_list, learner_list) # parse tmle3_fit into AIPW_tmle class AIPW_tmle$ new(A=eager_sim_obs$sim_A,Y=eager_sim_obs$sim_Y,tmle_fit = tmle3_fit,verbose = TRUE)$ summary()
Robins JM, Rotnitzky A (1995). Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association.
Chernozhukov V, Chetverikov V, Demirer M, et al (2018). Double/debiased machine learning for treatment and structural parameters. The Econometrics Journal.
Kennedy EH, Sjolander A, Small DS (2015). Semiparametric causal inference in matched cohort studies. Biometrika.
Pearl, J., 2009. Causality. Cambridge university press.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.