The apm
package implements Averaged Prediction Models (APM), a Bayesian model averaging approach for controlled pre-post designs. These designs compare differences over time between a group that becomes exposed (treated group) and one that remains unexposed (comparison group). With appropriate causal assumptions, they can identify the causal effect of the exposure/treatment.
In APM, we specify a collection of models that predict untreated outcomes. Our causal identifying assumption is that the model's prediction errors would be equal (in expectation) in the treated and comparison groups in the absence of the exposure. This is a generalization of familiar methods like Difference-in-Differences (DiD) and Comparative Interrupted Time Series (CITS).
Because many models may be plausible for this prediction task, we combine them using Bayesian model averaging. We weight each model by its robustness to violations of the causal assumption.
Our identification framework begins with prediction and correction steps. First, we train a model on pre-intervention data to predict untreated outcomes in the post-intervention period. Then, we correct the treated group’s predictions using the comparison group’s post-intervention prediction errors, which adjusts for shared time-varying shocks. The identifying assumption is that, without the policy change, prediction errors would be equal (in expectation) across treated and comparison groups.
We specify a collection of plausible models for this prediction task and compare across them using robustness to the causal assumption. Specifically, we quantify each model's differential prediction errors (i.e., differences between the treated and untreated group's prediction errors) during a series of pre-intervention validation periods. Taking the max as a summary across these periods, we consider models with smaller maximum differential prediction errors more robust.
Then we apply Bayesian model averaging (BMA), weighting each model by its posterior probability of being the most robust. Taking an "informal Bayesian approach", we sample model parameters from a quasi-posterior [@gelmanhill2006, p. 140]. This is a multivariate Normal distribution with mean equal to the estimated parameters of the fitted models and a variance-covariance matrix that incorporates across-model correlations. For each parameter draw, we compute the models' differential prediction errors and take the max over the validation periods. Each model's weight is the proportion of draws in which it minimizes the maximum differential prediction error (i.e., is most robust). Finally, using the corrected predictions from our averaged model, we estimate the average treatment effect on the treated (ATT).
{width="100%"}
For inference, we apply a fractional weighted bootstrap [@xuetal2020]. It takes into account uncertainty about the models' performance, but not the uncertainty in the BMA weights themselves, which would be computationally infeasible. Following @antonellietal2022, we estimate overall variance as the sum of two components: (1) sampling variance with fixed model uncertainty and (2) model uncertainty variance with fixed sampling uncertainty.
Finally, we can also perform causal sensitivity analyses by scaling the models' maximum differential prediction error in the validation periods by a factor $M$. This enables sensitivity measures such as:
The package implements the APM methods via three key functions:
apm_mod()
constructs candidate models that can predict untreated outcomes in both treated and comparison groups.
apm_pre()
fits these candidate models to pre-treatment validation data and generates the BMA weights for each model.
apm_est()
estimates the ATT, given the BMA weights, and constructs both statistical and causal bounds around it.
In this example, we apply APM to estimate the effect of Missouri’s 2007 repeal of its permit-to-purchase law on gun homicide rates [@websteretal2014; @hasegawaetal2019].
library(apm)
The package provides an example dataset with pre- and post-policy homicide rates:
data("ptpdata", package = "apm") # Inspect the dataset head(ptpdata)
The dataset includes:
state
: State name
year
: Year of observation
deaths
: The number of gun homicide deaths
crude_rate
: Gun homicide rate per 100,000
age_adj_rate
: Gun homicide rate per 100,000 adjusted for age
group
: Indicator for Missouri (1 = Missouri, 0 = comparison group)
treat
: Indicator for Missouri in post-treatment year (2008+) (1 = treated, 0 = untreated)
Note that observations with year == 2008
are the average of a state's observations over all post-treatment periods (2008 - 2016).
APM supports a range of model options:
formula_list
: list of model formulas with outcome on left-hand side and predictors on right-hand side, e.g., formula_list = crude_rate ~ 1
.
family
: list of family specifications passed to stats::glm()
when fitting models in apm_pre()
; "negbin"
can also be supplied to request negative binomial model with log link fit using MASS::glm.nb()
. To see list of family specifications, run ?family
.
lag
: vector of integers outcome lags to be used as predictors. For example, lag = 3
means to include lag-1, lag-2, and lag-3 outcomes as predictors. Default is 0 (for no lags).
diff_k
: vector of integers indicating outcome lags to be used as offsets. For example, diff_k = 1
means prior time point's outcome will be included as offset, equivalent to using the outcome minus its corresponding lag as the model's outcome. Default is 0 for no lags. Any models with a diff_k
value less than a lag
value are removed automatically. When used with a family with log link, lags are automatically log-transformed; apm_pre()
will return an error if non-positive values are present in the outcome.
log
: logical vector indicating whether outcome should be log-transformed. Default is FALSE
to use the original outcome. When lag
or diff_k
are greater than 0, outcome lags will also be log-transformed if TRUE
. When family has log link and diff_k
is greater than 0, lag in offset will be log transformed.
time_trend
: vector of integers indicating powers to be included in a time trend. For example, time_trend = 2
means to include as predictors time variable and its square. A value of 0 (the default) means continuous time is not included as predictor.
fixef
: logical vector indicating whether to include unit fixed effects as predictors. Default is FALSE
.
These lists of model options are combined factorially to create a collection of candidate models.
In this example, we specify two options for lags (no lag and lag-1), two options for outcome differences (no offset and immediate prior outcome), two options for the outcome scale (original and log transformed), and two options for time trends (no time trend and linear time trend), to get a set of candidate models:
models <- apm_mod(formula_list = crude_rate ~ 1, family = "gaussian", lag = 0:1, diff_k = 0:1, log = c(TRUE, FALSE), time_trend = 0:1, fixef = TRUE)
This produces r length(models)
candidate models, all of which use the "gaussian"
family (with an identity link function). This is fewer than the full factorial combination because of the embedded logic for combining outcome differences and lags (see above).
We now fit all r length(models)
models to pre-treatment data. For each model and each pre-treatment validation period, we compute the observed difference in average prediction errors between treated and comparison groups. From these differences in average prediction errors, we compute the Bayesian model averaging (BMA) weights that are eventually passed to apm_est()
for estimation of the average effect of treatment on the treated (the ATT).
The function apm_pre()
does the model fitting. It requires a data frame that contains a group indicator variable and a time variable. We specify 1999:2007
as the validation years in which each model will be. Each validation year's fit will be based on all data prior to that year. Therefore, we set the first validation period to 1999 so that even in the first validation year, we can train the models on five years of data (i.e., 1994:1998
). We specify the number of quasi-posterior draws using the nsim = 1000
argument; more draws gives a better approximation to the model posterior, but can slow down the computation.
# Set seed for reproducibility: ensures random sampling from # multivariate Normal produces same results each time code is run set.seed(098556947) fits <- apm_pre(models = models, data = ptpdata, group_var = "group", time_var = "year", unit_var = "state", val_times = 1999:2007, nsim = 1000)
We can view the largest average differential prediction error for each model and the BMA weights given to each model using summary()
on the returned object:
summary(fits)
We can plot the simulation-based posterior distribution of which model is most robust. The probabilities are the proportions of simulations in which each model is the winner.
plot(fits, type = "weights")
We can see the differential prediction errors in each model and year. Here, the winning model is highlighted: it is the model that includes a lag-1 outcome as a predictor and log-transforms the outcome. The maximum differential prediction error was observed in 2005.
plot(fits, type = "errors")
The plot below shows the predictions from this model in each validation period. The observed outcomes are displayed as points and the predicted outcomes as lines.
plot(fits, type = "predict")
Finally, we can also show the winning model's corrected predictions, that is, after incorporating the prediction error in the control group. To observe the corrected predictions, we can set type = "corrected"
, which is the prediction for the treated group, corrected for the comparison group's prediction error.
plot(fits, type = "corrected")
To estimate the ATT and conduct inference, we feed the output of a call to apm_pre()
into apm_est()
. The M
argument is the sensitivity parameter for set identification, which by default is set to M = 0
. When M
is set to a value greater than 0, apm_est()
will return estimates of the lower and upper bounds of the ATT. These bounds can incorporate both the uncertainty due to possible causal violations and sampling uncertainty. The R
argument is the number of bootstrap iterations used to estimate the sampling variance, holding model uncertainty fixed.
est <- apm_est(fits = fits, post_time = 2008, M = 1, R = 1000, all_models = TRUE)
To examine the estimates and uncertainty bounds, we run the following. The level = 0.95
argument specifies the statistical confidence level; to ignore sampling uncertainty, set level = 0
.
summary(est, level = 0.95)
The standard error is the square root of the sum of two variances: (1) the estimated sampling variance holding model uncertainty fixed and (2) estimated model uncertainty variance holding the sampling uncertainty fixed. For the ATT
row, the CI low
and CI high
outputs are the lower and upper confidence bounds for the ATT. The CI low
output for the M = 1
row is the lower confidence bound of the ATT's lower sensitivity bound. The CI high
output for the M = 1
row is the upper confidence bound of the ATT's upper sensitivity bound.
The figure below shows the estimated ATT under each model plotted against the maximum absolute difference in average prediction errors for that model. The model with the smallest maximum absolute difference in average prediction errors is displayed in red. The size of the points correspond to the BMA weights.
Small variation in the ATT estimates (y axis) across values of maximum absolute differences in prediction errors (x axis) suggests that we do not face a stark trade-off between model plausibility and robustness.
plot(est)
We can also apply a sensitivity analysis to increasing values of M
. For example, below we estimate the ATT's bounds under values of M
from 1 to 2 in increments of 0.25.
summary(est, M = seq(from = 1, to = 2, by = 0.25))
This output shows that our 95% confidence interval for the ATT's lower bound excludes 0 when M = 1.25
, but not when M = 1.5
. To find the exact changepoint value of M
, we can run the following.
robustness_bound(est, level = 0.95)
We can also run robustness_bound()
when the level is level = 0
, which will give us the value of M
in which the sensitivity bound (not statistical confidence bounds) begin to bracket 0.
robustness_bound(est, level = 0)
As we would expect, the changepoint value of M
is greater when level = 0
.
The BMA point estimate (M = 0
) is r round(x = est$BMA_att[1], digits = 2)
, with a standard error of r round(x = sqrt(est$BMA_var[1]), digits = 2)
, yielding a 95% confidence interval of [r round(x = summary(est)[1,3], digits = 2)
, r round(x = summary(est)[1,4], digits = 2)
]. This suggests that Missouri’s repeal of its permit-to-purchase law increased the state’s gun homicide rate by r round(x = summary(est)[1,3], digits = 2)
to r round(x = summary(est)[1,4], digits = 2)
per 100,000 people. Given Missouri’s 2007 homicide rate of r round(x = ptpdata$crude_rate[ptpdata$state == "Missouri" & ptpdata$year == 2007], digits = 2)
per 100,000 people, the estimated increase of r round(x = est$BMA_att[1], digits = 2)
represents a r round(x = (est$BMA_att[1]/ptpdata$crude_rate[ptpdata$state == "Missouri" & ptpdata$year == 2007]) * 100, digits = 0)
% rise. The changepoint value of M for the BMA estimator is r round(x = robustness_bound(est, level = 0), digits = 2)
. That is, if differential prediction errors were nearly twice what were seen in the validation periods, the point estimate would no longer indicate an increase in the gun homicide rate. Additionally, the lower bound estimator’s 95% confidence interval includes zero when M reaches r round(x = robustness_bound(est, level = 0.95), digits = 2)
. That is, at this multiplier of the differential prediction errors seen in the validation period, our statistical uncertainty bounds around the effect estimate would begin to include 0.
The apm
R package implements Averaged Prediction Models (APM), a unified framework for causal inference in controlled pre-post settings. APM generalizes a broad class of prediction-based methods by combining outcome prediction with error correction using a comparison group. The package also incorporates Bayesian Model Averaging to select the most robust model based on pre-period data.
worst_mod_fit <- apm_pre(models = models[which.max(apply(abs(fits$pred_error_diffs), 2, max))], data = ptpdata, group_var = "group", time_var = "year", unit_var = "state", val_times = 1999:2007, nsim = 10) worst_mod_est <- apm_est(fits = worst_mod_fit, post_time = 2008, M = 1, R = 10, all_models = TRUE, level = 0)
Through an application to Missouri’s 2007 permit-to-purchase law repeal, our results suggest that a lagged dependent variable model with unit fixed effects on the log scale was the most robust choice, leading to an estimated increase of r round(x = est$BMA_att[1], digits = 2)
homicides per 100,000 people. Sensitivity analysis indicates that for the estimated effect to be indistinguishable from zero, assumption violations would need to exceed r round(x = robustness_bound(est, level = 0), digits = 2)
times the worst pre-period discrepancies, compared to as low as r round(x = robustness_bound(worst_mod_est, level = 0), digits = 2)
under single-model approaches.
Built on a unified identification framework, APM offers a flexible, data-driven approach to causal inference in controlled pre-post settings. The apm
package prioritizes model averaging and robustness over assuming a single “correct” model while efficiently accounting for both sampling and model uncertainty. This ensures that researchers can achieve greater flexibility in model selection while maintaining rigorous and principled inference.
For more details, see:
::: {#refs} :::
# Hide session info in vignette sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.