ECICourse::upload_images_to_canvas()
knitr::opts_chunk$set(
  echo = TRUE,
  message = TRUE,
  collapse = TRUE
)

Read along with this tutorial, and run the R code in RStudio as you go (copy and paste, or type directly into R).

Start by loading packages:

library(ECICourse)
library(tidyverse)

The data for this tutorial describe a fictional observational study conducted by an online seller of accessories for smart phones and tablet computers. The company emailed a single-use coupon code to all 16,576 customers in its database whose most recent order was between 1 and 6 months ago. The coupons could be used to receive a 30% discount the next order, as long as they were redeemed within the next month.

The company wants to learn how much revenue increased as a result of customers using the coupons. However, because the customers chose whether to use the coupon or not, there is a problem of selection (non-random treatment). For example, high frequency customers might be more likely to place an order during the experiment than low frequency customers. The company therefore realizes that the naive estimator of the average treatment effect will be biased and inconsistent.

In this tutorial, we will use a classical matching strategy to calculate the average treatment effects among the treated (ATT; those using the coupons on their orders), controls (ATC; those who did not use the coupon), and all customers (ATE).

Data

To obtain the data describing this experiment, run the following code:

coupon <- get_ECI_data('module-6-tutorial')

The data contain an id for each customer (customer), background information describing each customer (explained below), indicators for whether the customer placed an order (placed_order), whether they redeemed the coupon with that order (used_coupon), and the size of the first order the customer placed during the promotion, not including the discount (order_size). If no order was placed (placed_order == 0), then order_size == 0. To simplify this tutorial, we will assume that none of the customers placed more than one order during the month of the promotion.

coupon

The columns last_order, num_orders, and avg_value are variables describing the past purchase behavior of each customer:

The rate of coupon use varied across customers with different values of last_order, num_orders, and avg_value.

coupon |>
  group_by(last_order, num_orders, avg_value) |>
  summarise(used_coupon = mean(used_coupon), .groups = 'drop') |>
  ggplot(aes(x = last_order, y = used_coupon, 
             colour = factor(num_orders))) + 
  geom_line() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_continuous(breaks = seq(0, 12, by = 3)) +
  scale_colour_viridis_d(option = 'C', end = .85) +
  facet_wrap(~avg_value) +
  labs(x = 'Months since last order', y = 'Percent using coupon',
       colour = 'Previous orders') +
  theme_bw()

Some customers placed orders without using their coupon. These customers might have deleted the email without reading it.

coupon |>
  group_by(used_coupon) |> 
  summarise(num_placing_order = sum(placed_order))

This raises an interesting point about the effect of the coupon. For some customers who used the coupon, receiving the coupon might have prompted them to place an order they otherwise would not have made. For these customers, we might expect the treatment effect to be rather large, because their values of Y(0) (how much they would spend without the coupon) would be equal to zero. At the same time, we can imagine that a different subset of customers who used the coupon might have placed an order even if they hadn't received the coupon. For these customers, we might expect their values of Y(0) to be greater than 0, and thus we might find a relatively smaller treatment effect.

Which of these two processes is actually at play is unobserved to us. By assuming that selection is on the observables (and only on the observables), we are assuming that among any group of customers with the same observed variables, the unobserved process determining whether a customer uses the coupon is unrelated to the size of their treatment effect. This is a very strong assumption!

Conditional average treatment effects

The company believes that the stratifying variables last_order, num_orders, and avg_value completely account for all systematic variation in whether customers used the coupon. In other words, the company is willing to assume that all selection is on the observables. This means that among groups of customers in the same stratum---i.e., customers who have all three values of last_order, num_orders, and avg_value in common---we are assuming that coupon use was essentially random with respect to their potential outcomes. To repeat, this is a very strong assumption to make, but it is what we need to assume in order to use a classical matching procedure to estimate average treatment effects. And in any case, the company believes this is close enough to the truth to make this assumption.

To help build intuition for how matching works, we will first consider just a single stratum of customers with last_order == 3, num_orders == '2', and avg_value == 'Medium value':

coupon |>
  filter(last_order == 3 & num_orders == '2' & avg_value == 'Medium value')
high_S <- coupon |>
  filter(last_order == 3 & num_orders == '2' & avg_value == 'Medium value')

The assumption of selection on the observables means that within this group, coupon use is independent of potential outcomes---how large orders would be when using or not using a coupon. Under this assumption, any difference in average order size among coupon users and non-users (in this stratum) is due to the coupon.

coupon |>
  filter(last_order == 3 & num_orders == '2' & avg_value == 'Medium value') |>
  group_by(used_coupon)  |>
  summarise(avg_order_size = mean(order_size), n_customers = n())

The conditional average treatment effect (CATE) for this stratum of customers is difference between these two average order sizes.

coupon |>
  filter(last_order == 3 & num_orders == '2' & avg_value == 'Medium value') |> 
  group_by(used_coupon)  |>
  summarise(avg_order_size = mean(order_size)) |>
  pivot_wider(names_from = 'used_coupon', values_from =  'avg_order_size',
              names_prefix = 'coupon_') |>
  mutate(CATE = coupon_1 - coupon_0) 

Following this logic, we can calculate CATEs for all strata---i.e., for all unique combinations of the three stratifying variables (last_order, num_orders, and avg_value). First, we calculate the average order size among coupon users and non-users in each stratum:

avg_order_sizes_by_group <- 
  coupon |>
  group_by(last_order, num_orders, avg_value, used_coupon) |>
  summarise(avg_order_size = mean(order_size), .groups = 'drop')
avg_order_sizes_by_group

We can now calculate the CATE for each group:

avg_order_sizes_by_group |>
  pivot_wider(names_from  = 'used_coupon', values_from = 'avg_order_size', 
              names_prefix = 'coupon_') |>
  mutate(CATE = coupon_1 - coupon_0) |>
  # renaming columns to make it easier to read:
  rename(with_coupon = coupon_1, without_coupon = coupon_0) 

The estimated CATEs shown above vary widely across groups of customers. As we will see below, one way to think of classical matching is that it works by taking weighted averages of these CATEs in order to generate unbiased and consistent estimates for the average treatment effect (ATE), average treatment effect on the treated (ATT), and the average treatment effect on the controls (ATC). The weights in the weighted averages depend on the target effect to be estimated.

Estimating the ATT with classical matching

First, we will use a matching procedure to estimate the ATT. For these data, the ATT is the average effect of coupon use on customers who chose to use the coupon. If that's confusing, think instead of the ATT as an answer to the following question: "Among customers who chose to use the coupon, how much smaller would their orders have been if they hadn't used the coupon?"

In the material below, we will first apply the matching procedure in a way that goes slowly and uses a lot of code, but is conceptually more instructive. At the end I will show you an equivalent procedure, using the Matching R package, that requires less code.

Step by step

First, isolate the treatment and control groups into different data frames:

treatment <-
  coupon |>
  filter(used_coupon == 1)
control <-
  coupon |>
  filter(used_coupon == 0)

The two data frames are not balanced, by which we mean that within each stratum of the variables last_order, num_orders, and avg_value, there are a different number of coupon users and non-users.

treatment |>
  count(last_order, num_orders, avg_value)
control |>
  count(last_order, num_orders, avg_value)

To calculate the ATT, we want to construct an artificial control group that is balanced with respect to the treatment group. Consider this example: There are r sum(high_S$used_coupon) treated customers with last_order == 3, num_orders == '2', and avg_value == 'Medium value'.

n_treated_in_this_stratum <- 
  treatment |>
  filter(last_order == 3 & num_orders == '2' & avg_value == 'Medium value') |>
  nrow()
n_treated_in_this_stratum

To calculate the ATT, the artificial control group for this stratum needs to contain exactly r sum(high_S$used_coupon) coupon non-users. After sampling, the artificial control group for this stratum might look something like this:

set.seed(1)
artificial_control_for_this_stratum <- 
  control |>
  filter(last_order == 3 & num_orders == '2' & avg_value == 'Medium value') |>
  sample_n(n_treated_in_this_stratum, replace = TRUE) |>
  arrange(customer)
artificial_control_for_this_stratum

To calculate the ATT for the entire customer base, we need to repeat this sampling procedure for each stratum (i.e., unique combination of values for the three variables). To do that, we need to know how many treated customers there are in each stratum. We will calculate that first:

n_treated_per_stratum <-
  treatment |>
  count(last_order, num_orders, avg_value, name = 'n_treated')
n_treated_per_stratum

Next, within each stratum, we need to sample n_treated customers from the control group. The approach here will be to join the tables control (containing control group customers) and n_treated_per_stratum (indicating how many control customers to sample from each stratum). We will then use the values of n_treated and the sample_n function (from dplyr) to construct the artificial control group.

control_with_n_treated <- 
    inner_join(control, n_treated_per_stratum, 
               by = c("last_order", "num_orders", "avg_value"))
control_with_n_treated

artificial_control <-
  control_with_n_treated |>
  group_by(last_order, num_orders, avg_value, n_treated) |>
  #  within each stratum, sample n_treated
  #    controls with replacement:
  sample_n(size = n_treated[1], replace = TRUE) |>       
  ungroup() |>                                           
  select(-n_treated)

The treatment and artificial_control data frames are now balanced within each stratum:

treatment |>
  count(last_order, num_orders, avg_value)
artificial_control |>
  count(last_order, num_orders, avg_value)

Now that we have an artificial control group that is balanced with respect to the treatment group, the next step is to combine the treatment and artificial control groups in order to calculate the average treatment effect on the treated (ATT):

treatment_and_artificial_control <-
  bind_rows(treatment, artificial_control)
treatment_and_artificial_control |> 
    group_by(used_coupon) |> 
    summarise(avg_order_size = mean(order_size)) |> 
    pivot_wider(names_from = used_coupon, values_from = avg_order_size) |> 
    mutate(att = `1` - `0`)

A more concise approach using dplyr

The code above is spread out with a lot of discussion. Here is a version that is more compact, and makes use of the group_map function in dplyr. Feel free to skip over this material and jump to the next subsection (showing how to use the Matching package).

coupon |>
  group_by(last_order, num_orders, avg_value) |>
  # number of customers in the treated condition within each stratum:
  mutate(n_treated = sum(used_coupon)) |>   
  group_by(last_order, num_orders, avg_value, 
           used_coupon, n_treated) |>
  group_map(                                 
                                             # apply the function below to each group_by subset of rows:
    ~sample_n(.x,                            # sample rows from this group_by subset (.x)
              size = .y$n_treated,           # <- the number of rows to sample = n_treated (.y refers to the grouping variables)
              replace = !.y$used_coupon),    # sample with replacement from the control group (sample all from the treated group)
    .keep = TRUE) |>
  bind_rows() |>
  # calculate average order size for treatment/control
  group_by(used_coupon) |>
  summarise(avg_order_size = mean(order_size)) |>
  # calculate ATT
  pivot_wider(names_from = 'used_coupon',    
              values_from = 'avg_order_size',
              names_prefix = 'coupon_') |>  
  summarise(ATT = mean(coupon_1 - coupon_0))

The estimated ATT is not precisely the same in the two preceding examples because the randomly sampled artificial control groups are not the same.

An even more concise approach using the Matching package

First, load the Matching package. The Matching package also loads the MASS package, which replaces dplyr's version of select. We will need dplyr's select, however, so we call select <- dplyr::select after loading packages.

library(Matching)
select <- dplyr::select     # because MASS replaced dplyr::select and we want the dplyr version

To calculate the ATT using the Matching package, we need to create distinct variables for 1) the outcome variable (order_size), 2) the matching variables (last_order, num_orders, and avg_value), and 3) the treatment variable (used_coupon):

Y <- coupon$order_size
X <- 
  coupon |>
  select(last_order, num_orders, avg_value) |>
  # Matching expects a matrix of numbers, not a data frame with factors:
  mutate(across(everything(), as.numeric)) |>
  as.matrix()
D <- coupon$used_coupon
att <- Match(Y = Y, Tr = D, X = X, 
           estimand = 'ATT', M = 1, 
           replace = TRUE, ties = TRUE)
summary(att)

Notice that the Match procedure calculates a standard error for the estimate, which we didn't do in our version based on dplyr.

ATC and ATE using dplyr

To calculate the ATC, we follow a very similar procedure, except we sample treatment cases in order to create an artificial treatment group that matches the number of control units in each stratum. Hence, code below is similar to what we did above.

control_and_artificial_treatment <- 
  coupon |>
  group_by(last_order, num_orders, avg_value) |>
  mutate(n_control = sum(used_coupon == 0)) |>
  group_by(last_order, num_orders, avg_value, used_coupon, n_control) |>
  group_map(
    ~sample_n(.x, size = .y$n_control, 
              replace = .y$used_coupon), .keep = TRUE) |>
  bind_rows()

To calculate the ATC we continue in a similar manner as before:

control_and_artificial_treatment |>
  group_by(used_coupon) |>
  summarise(avg_order_size = mean(order_size), .groups = 'drop') |>
  pivot_wider(names_from = 'used_coupon', values_from = 'avg_order_size',
              names_prefix = 'coupon_') |> 
    summarise(ATC = mean(coupon_1 - coupon_0))

The ATC for this study represents the increase in order size we would expect among customers who didn't use the coupon, if we could somehow force them to place an order using the coupon.

ATC and ATE using the Matching package

We can also use the Matching package to calculate the ATC and ATE, just as we did when calculating the ATT.

atc <- Match(Y = Y, Tr = D, X = X, 
           estimand = 'ATC', M = 1, 
           replace = TRUE, ties = TRUE)
summary(atc)
ate <- Match(Y = Y, Tr = D, X = X, 
           estimand = 'ATE', M = 1, 
           replace = TRUE, ties = TRUE)
summary(ate)


jasonmtroos/ECICourse documentation built on Sept. 8, 2023, 9:18 p.m.