knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  )
library(kableExtra)

augsynth: Estimating treatment effects with staggered adoption

The data

To show the features of the multisynth function we will use data on the effects of states implementing mandatory collective bargaining agreements for public sector unions (Paglayan, 2018)

library(magrittr)
library(dplyr)
library(augsynth)
data <- read.csv("https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/WGWMAV/3UHTLP", sep="\t")

The dataset contains several important variables that we'll use:

data %>% 
    filter(year == 1960) %>% 
    select(year, State, YearCBrequired, lnppexpend) %>%
    head() %>%
    kable() %>%
    kable_styling(bootstrap_options =c("hover", "responsive"))

To run multisynth, we need to include a treatment status column that indicates which state is treated in a given year, we call this cbr below. We also restrict to the years 1959-1997 where we have yearly measurements of expenditures and drop Washington D.C. and Wisconsin from the analysis.

data %>%
    filter(!State %in% c("DC", "WI"),
           year >= 1959, year <= 1997) %>%
    mutate(YearCBrequired = ifelse(is.na(YearCBrequired), 
                                   Inf, YearCBrequired),
           cbr = 1 * (year >= YearCBrequired)) -> analysis_df

Partially pooled SCM with an intercept

To fit partially pooled synthetic controls, we need to give multisynth a formula of the form outcome ~ treatment, point it to the unit and time variables, and choose the level of partial pooling nu. Setting nu = 0 fits a separate synthetic control for each treated unit and setting nu = 1 fits fully pooled synthetic controls. If we don't set nu, multisynth will choose a heuristic value based on how well separate synthetic controls balance the overall average. By default, multisynth includes an intercept shift along with the weights; we can exclude the intercept shift by setting fixedeff = F. We can also set the number of pre-treatment time periods (lags) that we want to balance with the n_lags argument and the number of post-treatment time periods (leads) that we want to estimate with the n_leads argument. By default multisynth sets n_lags and n_leads to the number of pre-treatment and post-treatment periods for the last treated unit, respectively.

# with a choice of nu
ppool_syn <- multisynth(lnppexpend ~ cbr, State, year, 
                        nu = 0.5, analysis_df)
# with default nu
ppool_syn <- multisynth(lnppexpend ~ cbr, State, year, 
                        analysis_df)

print(ppool_syn$nu)

ppool_syn

Using the summary function, we'll compute the treatment effects and standard errors and confidence intervals for all treated units as well as the average via the wild bootstrap. (This takes a bit of time so we'll store the output) We can also change the significant level associated with the confidence intervals by setting the alpha argument, by default alpha = 0.05.

ppool_syn_summ <- summary(ppool_syn)

We can then report the level of global and individual balance as well as estimates for the average.

ppool_syn_summ

nopool_syn_summ$att is a dataframe that contains all of the point estimates, standard errors, and lower/upper confidence limits. Time = NA denotes the effect averaged across the post treatment periods.

ppool_syn_summ$att %>%
  filter(Time >= 0) %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options =c("hover", "responsive"))

We can also visually display both the pre-treatment balance and the estimated treatment effects.

plot(ppool_syn_summ)

And again we can hone in on the average effects.

plot(ppool_syn_summ, levels = "Average")

Collapsing into time cohorts

We can also collapse treated units with the same treatment time into time cohorts, and find one synthetic control per time cohort by setting time_cohort = TRUE. When the number of distinct treatment times is much smaller than the number of treated units, this will run significantly faster.

# with default nu
ppool_syn_time <- multisynth(lnppexpend ~ cbr, State, year,
                        analysis_df, time_cohort = TRUE)

print(ppool_syn_time$nu)

ppool_syn_time

We can then compute effects for the overall average as well as for each treatment time cohort, rather than individual units.

ppool_syn_time_summ <- summary(ppool_syn_time)
ppool_syn_time_summ
ppool_syn_time_summ$att %>%
  filter(Time >= 0) %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options =c("hover", "responsive"))

Again we can plot the effects.

plot(ppool_syn_time_summ)

Including auxiliary covariates

We can also include an additional set of covariates to balance along with the pre-treatment outcomes. First, let's create a data frame with the values of some covariates in a few different years:

data %>%
  select(State, year, agr, pnwht, purban, perinc, studteachratio) %>%
  group_by(State) %>%
  summarise(perinc_1959 = perinc[year == 1959],
            studteachratio_1959 = studteachratio[year == 1959]) %>% 
  # filter to lower 48 where we have data
  filter(!State %in% c("AK", "HI"))  -> cov_data

analysis_df %>%
  inner_join(cov_data, by = "State") -> analysis_df_covs

To include auxiliary covariates, we can add them in to the formula after |. This will balance the auxiliary covariates along with the pre-treatment outcomes simultanouesly. If the covariates vary during the pre-treatment periods, multisynth will use the average pre-treatment value. We can change this behavior by including our own custom aggregation function via the cov_agg argument.

# with default nu
ppool_syn_cov <- multisynth(lnppexpend ~ cbr | perinc_1959 + studteachratio_1959,
                            State, year, analysis_df_covs)

print(ppool_syn_cov$nu)

ppool_syn_cov

Again we can compute effects, along with their standard errors and confidence intervals, and plot.

ppool_syn_cov_summ <- summary(ppool_syn_cov)
ppool_syn_cov_summ
ppool_syn_cov_summ$att %>%
  filter(Time >= 0) %>%
  head() %>%
  kable() %>%
  kable_styling(bootstrap_options =c("hover", "responsive"))

Again we can plot the effects.

plot(ppool_syn_cov_summ, levels = "Average")


ebenmichael/augsynth documentation built on March 20, 2024, 5:20 a.m.