ECICourse::upload_images_to_canvas() knitr::opts_chunk$set( echo = TRUE, message = TRUE, collapse = TRUE )
Read along with this tutorial, and run the R code in RStudio as you go (copy and paste, or type directly into R).
Start by loading packages:
library(ECICourse) library(tidyverse)
The data for this tutorial describe the same fictional observational study as in the tutorial on classical matching. In this tutorial, we will use a weighted regression estimator to obtain unbiased and consistent estimates of treatment effects under the assumption of selection on the observables.
To obtain the data describing this experiment, run the following code:
coupon <- get_ECI_data('module-6-tutorial') coupon
As in the tutorial on classical matching, the company believes that the stratifying variables last_order
, num_orders
, and avg_value
completely account for all systematic variation in who used the coupon. In other words, the company is willing to assume that all selection is on the observables. Hence, among groups of customers in the same stratum---i.e., among customers who have all three values of last_order
, num_orders
, and avg_value
---we are assuming that coupon use was essentially random. As noted in the tutorial on classical matching, this is a very strong assumption to make, but it is what we need to assume in order to use weighted regression to estimate average treatment effects.
There are two main parts to estimating treatment effects using weighted regression. The first part involves obtaining estimates of customers' propensities to use the coupon. For this tutorial, we will use logistic regression to estimate these propensity scores. The second part involves performing the weighted regression, using weights that are based on the propensity scores estimated in part 1.
There are three variables that we can use to estimate the propensity to use the coupon: last_order
, num_orders
, and avg_value
. These variables collectively satisfy the back-door criterion for blocking associations between coupon use and order size. That is, we can condition on these three variables to block all back-door paths between used_coupon
and total_order
. This is easier to see with a DAG:
dagitty::dagitty(x = 'dag { used_coupon [pos = "0, .5"]; total_order [pos = "1, 0"]; last_order [pos = "-1, -.5"]; num_orders [pos = "-1, -.25"]; avg_value [pos = "-1, 0"]; last_order -> used_coupon -> total_order; num_orders -> used_coupon <- avg_value -> total_order; num_orders -> total_order; last_order -> total_order; }') |> plot()
To estimate the propensities, we perform logistic regression. Because our objective is to estimate customers' propensities to use the coupon, used_coupon
is the dependent variable in this regression:
propensity_fit <- glm(used_coupon ~ last_order + num_orders + avg_value, data = coupon, family = binomial) summary(propensity_fit)
The propensity scores are the expected values of used_coupon
from the logistic regression:
p <- predict(propensity_fit, type = 'response') ggplot(NULL) + stat_count(aes(x = p))
There are 30 unique values of the propensity score, all of which are values between 0 and 1 (because they represent probabilities of coupon use). There are 30 unique values of the propensity score because there are 30 unique combinations of the three regressors:
coupon |> distinct(last_order, num_orders, avg_value) |> arrange(last_order, num_orders, avg_value)
To estimate the average treatment effect (ATE), we will perform a linear regression of total_order
on used_coupon
. Importantly, we will include regression weights w
, with w = 1/p
if used_coupon == 1
, and w = 1/(1-p)
if used_coupon == 0
.
ate_data <- coupon |> mutate(w = case_when( used_coupon == 1 ~ 1/p, used_coupon == 0 ~ 1/(1-p))) ate_data ate_fit <- lm(order_size ~ used_coupon, data = ate_data, weights = w) summary(ate_fit)
The estimated ATE is r round(ate_fit$coef['used_coupon'], 3)
. You might recall from the classical matching tutorial that the estimated ATE using that method was 24.164. The discrepancy is due to the fact that we have not estimated the propensity scores correctly! Because I simulated these data, I know for a fact that the logit model we are using to estimate the propensities is not correct. Here I want to demonstrate how the model we use to estimate propensities can affect estimates of treatment effects. For example, compare the estimated ATE with one obtained using a different propensity model:
options(width = 150)
propensity_fit2 <- glm(used_coupon ~ factor(last_order) * num_orders * avg_value, data = coupon, family = binomial) summary(propensity_fit2)
Due to the interactions in the regression model above, we end up estimating r sum(!is.na(coef(propensity_fit2)))
parameters in this propensity model (some combinations of terms are collinear, hence not all combinations can be estimated) instead of 6 in the previous propensity model.
p2 <- predict(propensity_fit2, type = 'response') ate_data2 <- coupon |> mutate(w = case_when( used_coupon == 1 ~ 1/p2, used_coupon == 0 ~ 1/(1-p2))) ate_data2 ate_fit2 <- lm(order_size ~ used_coupon, data = ate_data2, weights = w) summary(ate_fit2)
The estimated ATE of r round(ate_fit2$coef['used_coupon'], 3)
is now closer to what we obtained using classical matching (and because I simulated the data, I know this is also close to the true ATE).
If the model we use to estimate the propensity scores is badly misspecified (relative to the "true" model), then the estimated treatment effect will be biased, and probably inconsistent as well. The extent of the bias depends on how misspecified the propensity score model is. Unfortunately, because we don't know the "true" propensity model, our only recourse is to hope that our propensity model has captured the most essential determinants of coupon use. Or, if we don't believe that is likely to be the case, we can try to obtain whatever missing data we think would make the propensity model more "correct".
This subsection is about doubly-robust weighted regression estimators, which are described in MW 7.3. That part of chapter 7 is not required reading, so feel free to skip over this subsection.
Recall that the first weighted regression we ran (above) led to an incorrect (biased, possibly inconsistent) estimated ATE, and that this happened because the propensity score model was misspecified. The main issue is that, because the estimated propensity weights are misspecified, they are not able to fully block the non-causal associations between used_coupon
and order_size
that are leaking through the open back doors.
But what if we also adjusted for these variables in the regression of order_size
on used_coupon
? Would that eliminate some (or even all) of the remaining bias? The answer is "probably, yes." Such an estimator is called "doubly-robust" because it accounts for confounding bias twice—once through regression weights, and again through regression covariates.
ate_fit_dr1 <- lm(order_size ~ used_coupon + last_order + num_orders + avg_value, data = ate_data, weights = w) summary(ate_fit_dr1)
The estimated ATE is still off, but it is closer to what we obtained from classical matching. Keep in mind that in a real-world setting, we would not know the true propensity model. Thus, doubly-robust estimators offer a bit of insurance against misspecification of propensity scores. If we use the better propensity weights generated from the second logistic regression in a doubly-robust regression, we recover almost exactly the same ATE as we do using standard weighted regression.
ate_fit_dr2 <- lm(order_size ~ used_coupon + last_order + num_orders + avg_value, data = ate_data2, weights = w) coef(ate_fit_dr2)['used_coupon'] coef(ate_fit2)['used_coupon']
To estimate average treatment effects for coupon users (ATT) and non-users (ATC), we use a similar procedure. Below, we will create a data frame that contains the regression weights for the ATE, ATT, and ATC. Notice that these weights are different for each effect we want to estimate.
data_with_weights <- coupon |> mutate( w_ate = case_when( used_coupon == 1 ~ 1/p2, used_coupon == 0 ~ 1/(1 - p2)), w_att = case_when( used_coupon == 1 ~ 1, used_coupon == 0 ~ p2 / (1 - p2)), w_atc = case_when( used_coupon == 1 ~ (1 - p2) / p2, used_coupon == 0 ~ 1)) data_with_weights
First, consider the weights for estimating the ATT among these two customers:
data_with_weights |> filter(customer %in% c('00038', '00062')) |> select(customer, last_order, num_orders, avg_value, used_coupon, placed_order, order_size, w_att)
The observation for customer 00062, like those for all customers in the treatment group (i.e. with used_coupon == 1
), has a weight of 1. However, customer 00061, who is in the control group, has a much smaller weight. This weight reflects the fact that customers like 00062 and 00038 have a very low propensity score. As such, customer 00038 is not very representative of what a typical customer in the treatment group looks like.
Now consider the observations for customers 02240 and 02018:
data_with_weights |> filter(customer %in% c('02240', '02018')) |> select(customer, last_order, num_orders, avg_value, used_coupon, placed_order, order_size, w_att)
Customer 02018 has a higher weight than 00038, reflecting the relatively high likelihood of using the coupon among customers like 02018 and 02240. Indeed, although customer 02018 did not use the coupon, they did place an order. In total, customer 02018 looks more like a member of the treated group than customer 00038, and this is reflected in a higher weight for calculating the ATT.
We can now perform weighted regression to estimate the ATT:
att_fit <- lm(order_size ~ used_coupon, data = data_with_weights, weights = w_att) summary(att_fit)
And finally, the ATC:
atc_fit <- lm(order_size ~ used_coupon, data = data_with_weights, weights = w_atc) summary(atc_fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.