knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(ggplot2)

This package implements the imputation method developed by Borusyak, Jaravel, and Spiess (2021) for staggered difference-in-differences. It allows for constant or dynamic treatment effects, among others. The following presentation highlights the main features available using a simple example. Note that this package is a wrapper around the following packages: collapse, data.table and fixest. It allows us a highly efficient estimation procedure, even with large and complex datasets.

A simple example

Generating simulated data

To show how to use the package, we will generate a simple dataset similar to the one used in the original paper. Let's first load the package:

library(didImputation)

The custom function generateDidData is directly available to generate simulated data. The setup is as follows:

data <- generateDidData(i = 2000,
                        t = 6,
                        hdfe = FALSE,
                        control = FALSE,
                        treatment = ~ d * (k + 1))

Two implications are worth noting. They have been emphasized by the recent literature on staggered designs and are not specific to our setup.

Let's plot the true treatment effects over time for each treatment group:

agg <- aggregate(data[, "trueffect"],
                 by = data[, c("t", "g")],
                 mean)

ggplot(agg, aes(x = t, y = x, color = as.factor(g))) +
  geom_line() +
  geom_point() +
  labs(x = "Period", y = "True ATT", color = "Treatment group") +
  scale_x_continuous(breaks = seq(1, 6)) +
  theme_classic() +
  theme(legend.position = "bottom")

Estimation procedure

The didImputation function estimates treatment effects. It allows for static and dynamic treatment effects (by horizon). Let's review some main arguments and options of this function.

est <- didImputation(y0 = y ~ 0 | i + t,
                     data = data,
                     cohort = "g",
                     nevertreated.value = Inf,
                     unit = "i",
                     time = "t",
                     coef = -Inf:Inf)

The summary method reports the results from the estimation:

summary(est)

You can extract the coefficients table if you want to build your own plot or table.

est$coeftable

Finally, we can plot the coefficients from the fully dynamic model using the didplot function (note that you can choose the confidence interval level to report). The results are very close to the true effects:

didplot(est) +
  geom_point(aes(y = c(0, 0, 0, 1, 2, 3, 4, 5), shape = "True ATT"),
             size = 3, color = "firebrick" ) +
  scale_shape_manual(name = "", values = c(9))

Overall treatment effect

The didImputation function allows for overall treatment effects through the OATT argument. When set equal to TRUE, the estimation reports the coefficient for the total average treatment to the treated and the pre-trends coefficients from the coef argument:

static <- didImputation(y0 = y ~ 0 | i + t,
                     data = data,
                     cohort = "g")

summary(static)

Including other fixed-effects

Additional fixed-effects can be directly included in the formula after the |. Lets generate a dataset with a high-dimensional fixed-effects variable and estimate the model:

# generate data with hdfe
data <- generateDidData(i = 2000,
                        t = 6,
                        hdfe = TRUE,
                        control = FALSE,
                        treatment = ~ d * (k + 1))

# fully dynamic estimation
est <- didImputation(y0 = y ~ 0 | i + t + hdfefactor,
                     data = data,
                     cohort = "g")

summary(est)

Including continuous controls

Additional continuous controls can be directly included in the formula before the |. Lets generate a dataset with such control and estimate the model:

# generate data with hdfe
data <- generateDidData(i = 2000,
                        t = 10,
                        hdfe = FALSE,
                        control = TRUE,
                        treatment = ~ d * (k + 1))

# fully dynamic estimation
est <- didImputation(y0 = y ~ x1 | i + t,
                     data = data,
                     cohort = "g")

summary(est)

Heterogeneity

We may want to run heterogeneity tests, this is particularly useful if we want to run an event-study with multiple contrasts (e.g. sex, age class, region) while keeping the whole sample (such that we maintain statistical power). We can do that simply with option het. The estimation will produce a result for each contrast.

# Generate data with unit invariant heterogeneous effect
data_het <- generateDidData(i = 5000,
                        t = 10,
                        treatment = ~ d * (ig + 1) * (k + 1)
)

# Compute true ATT for later plot
trueATT <- data.frame(k = rep(-4:8, 2),
                     ig = c(rep(1, 13), rep(0,13)),
                     d = as.numeric(rep(-4:8, 2) >= 0))

trueATT$y <- with(trueATT,{
      d * (ig + 1) * (k + 1)
})

# Estimate the heterogeneous effect
est_het <- didImputation(y ~ 0 | i + t,
              cohort = "g",
              het = "ig",
              data = data_het,
              coef = -4:Inf)

didplot(est_het) +
  geom_point(data = trueATT,
             aes(x = k, y = y, color = factor(ig), shape = "True ATT"),
             size = 3) +
  scale_shape_manual(name = "", values = c(9))

Additional features

Sample weights

Sample weights can be used by passing the variable in the w argument.

Standard errors

The standard errors are computed using an alternating projection method. You can tweek its meta-parameter by changing tol and maxit. To speed up the estimation procedure, you can set with.se = FALSE. Standard errors will not be computed in this case but estimation will be significqntly quicker.

Effective sample size

Borusyak, K., Jaravel, X., & Spiess, J. give a condition on weights such that consistency of estimators holds. This condition states that the Herfindahl index of weights must converge to zero. Another interpretation is that the inverse of the index is a measure of 'effective sample size'. Authors recommand an effective sample size of at least 30. If the effective sample size is lower, a warning will be thrown. You can tweek this parameter by changing effective.sample option in didImputation.



CdfInnovLab/didImputation documentation built on Dec. 17, 2021, 1:57 p.m.