\newcommand{\EE}{\mathbb E}
knitr::opts_chunk$set(echo = TRUE)
This package implements doubly robust confidence sequences which provide safe, anytime-valid inference for the average treatment effect (ATE) from sequential data. These allow you to
Furthermore, these confidence sequences can be used in both randomized sequential experiments and observational studies with no unmeasured confounding.
The main methodological reference is
You can install the package using devtools
:
# Install devtools if you do not have it: install.packages("devtools") # Install the package using devtools::install_github devtools::install_github("wannabesmith/drconfseq")
Let's jump into a simple example taken from "Doubly-robust confidence sequences for sequential causal inference". Suppose that we wish to estimate the ATE in an observational setting with no unmeasured confounding.
library(drconfseq) # Optional, can be used to speed up computations library(parallel)
First, we will generate $n = 10^4$ observations each with 3 real-valued covariates from a trivariate Gaussian.
n <- 10000 d <- 3 X <- cbind(1, matrix(rnorm(n*d), nrow = n))
Randomly assign subjects to treatment or control groups with equal probability.
treatment <- rbinom(n = n, size = 1, prob = 1 / 2)
Define the regression function, \begin{equation} \label{eq:simulationRegressionFunction} f^\star(x_1, x_2, x_3) := 1 - x_1^2 - 2\sin(x_2) + 3|x_3|, \end{equation} and the target parameter (which we will ensure is the average treatment effect by design),
[ \mathrm{ATE} := 1. ]
beta_mu <- c(1, -1, -2, 3) mu <- function(x) { beta_mu %*% c(1, x[1]^2, sin(x[2]), abs(x[3])) } ATE <- 1
Finally, generate outcomes $y_1, \dots, y_n$ as
[ y_i := f^\star(x_{i, 1}, x_{i, 2}, x_{i,3}) + \mathrm{ATE}\cdot \mathrm{treatment}_i + \epsilon_i, ]
where $\epsilon_i \sim t_{5}$ are drawn from a $t$-distribution with 5 degrees of freedom.
y <- apply(X, MARGIN=1, FUN=mu) + treatment*ATE + rt(n, df=5)
We will build three estimators for the ATE with increasing degrees of complexity.
Since we are in a randomized experiments, all three estimators will consistently estimate and provide valid sequential inference for the ATE. However, since the underlying regression function is nonlinear, we expect the Super Learner to outperform the Parametric to outperform the Unadjusted.
# Get SuperLearner prediction function for $\mu^1$. sl_reg_1 <- get_SL_fn(SL.library = c("SL.earth", "SL.gam", "SL.glm", "SL.ranger")) # Do the same for $\mu^0$. sl_reg_0 <- sl_reg_1 # Get Parametric regression function for $\mu^1$. parametric_reg_1 = get_SL_fn(SL.library = "SL.glm") # Do the same for $\mu^0$. parametric_reg_0 = get_SL_fn(SL.library = "SL.glm") # Compute the confidence sequence at logarithmically-spaced time points times <- unique(round(logseq(500, 10000, n = 10))) alpha <- 0.1 # Set n_cores to 1 if you are not using the parallel package n_cores <- detectCores() confseq_SL <- confseq_ate(y, X, treatment, regression_fn_1 = sl_reg_1, regression_fn_0 = sl_reg_0, propensity_score_fn = function(y, X, newX){1/2}, t_opt = 1000, alpha=alpha, times=times, n_cores = n_cores, cross_fit = TRUE) confseq_glm <- confseq_ate(y, X, treatment, regression_fn_1 = parametric_reg_1, regression_fn_0 = parametric_reg_0, propensity_score_fn = function(y, X, newX){1/2}, t_opt = 1000, alpha=alpha, times=times, n_cores = n_cores, cross_fit = TRUE) confseq_unadj <- confseq_ate_unadjusted(y = y, treatment = treatment, propensity_score = 1/2, t_opt = 1000, alpha = alpha, times = times)
Let us now plot the confidence sequence.
library(ggplot2) plot_df <- data.frame(t = rep(times, 6), bound = c(confseq_SL$l, confseq_SL$u, confseq_glm$l, confseq_glm$u, confseq_unadj$l, confseq_unadj$u), upper_lower = c(rep('l', length(confseq_SL$l)), rep('u', length(confseq_SL$u)), rep('l', length(confseq_glm$l)), rep('u', length(confseq_glm$u)), rep('l', length(confseq_unadj$u)), rep('u', length(confseq_unadj$u))), Model = c(rep('Super Learner', 2*length(confseq_SL$l)), rep('Parametric', 2*length(confseq_glm$l)), rep('Unadjusted', 2*length(confseq_unadj$l)))) plt_cs <- ggplot() + geom_line(aes(x=t, y=bound, color=Model, linetype=Model), data=plot_df[plot_df$upper_lower=='l',]) + geom_line(aes(x=t, y=bound, color=Model, linetype=Model), data=plot_df[plot_df$upper_lower=='u',]) + geom_hline(yintercept=ATE, linetype='dashed', color='gray') + ylab('Confidence sequences for the\naverage treatment effect') + xlab('Time') + scale_x_log10(breaks = scales::trans_breaks("log10", function(x) 10^x), labels = scales::trans_format("log10", scales::math_format(10^.x))) + annotation_logticks(colour = "grey", side = "b") + theme_minimal() + theme(text = element_text(family="serif")) + scale_color_brewer(palette="Set2") plt_cs
First, clone the repository:
```{zsh, eval=FALSE} git clone git@github.com:WannabeSmith/drconfseq.git
Then in an R console, run ```r devtools::check("path/to/drconfseq")
See the README in this folder.
The methods implemented in this package were developed by Ian Waudby-Smith, David Arbour, Ritwik Sinha, Edward H. Kennedy, and Aaditya Ramdas.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.