Empirical Application in Gunsilius (2023)
In DiSCos: Distributional Synthetic Controls Estimation

knitr::opts_chunk$set(
    comment = "#",
    echo=FALSE,
  error = FALSE,
  tidy = FALSE,
  cache = FALSE,
  collapse = TRUE,
  eval=FALSE, # dont rerun vignette when building package
  out.width = '100%',
  dpi = 144
)

library(pracma)
library(parallel)
library(stats)
library(maps)
library(data.table)
library(CVXR)
library(DiSCos)
library(ggplot2)

Introduction

This vignette demonstrates how to use the DiSCo package by way of the empirical application in @gunsilius2023distributional, which is based on @dube2019minimum. We illustrate the use of the two main functions: 1) DiSCo, which estimates the raw distributional counterfactuals, computes confidence intervals, and optionally performs a permutation test; and 2) DiSCoTEA, the "Treatment Effect Aggregator", which takes in the distributional counterfactuals and computes aggregate treatment effects using a user-specified aggregation statistic.

Distributional Synthetic Controls

We briefly review the main idea behind Distributional Synthetic Controls. Denote $Y_{jt,N}$ the outcome of group $j$ in time period $t$ in the absence of an intervention. Also denote $Y_{jt,I}$ the outcome in the presence of an intervention at time $t > T_0$. Denote the quantile function, $$ F^{-1}(q):=\inf {y \in \mathbb{R}}{F(y) \geq q}, \quad q \in(0,1), $$ where $F(y)$ is the corresponding cumulative distribution function. One unit $j=1$ has received treatment, while the other units $j=2,\ldots,J$ have not. Then the goal is to estimate the counterfactual quantile function $F{Y_{1 t}, N}^{-1}$ of the treated unit had it not received treatment by an optimally weighted average of the control units' quantile functions, $$ F_{Y_{1 t}, N}^{-1}(q)=\sum_{j=2}^{J+1} \lambda_j^ F_{Y_{j t}}^{-1}(q) \quad \text { for all } q \in(0,1) $$\ In practice, we do this by solving the following optimization problem: $$ \vec{\lambda}_t^=\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}} \int_0^1\left|\sum_{j=2}^{J+1} \lambda_j F_{Y_{j t}}^{-1}(q)-F_{Y_{1 t}}^{-1}(q)\right|^2 d q $$which gives an optimal weight for each unit-period combination. This problem can be solved by a simple weighted regression, which is implemented in the DiSCo_weights_reg function. To obtain the overall optimal weights $\vec{\lambda}^*$, we take the average of the optimal weights across all periods.

Basic Usage

To get coding, we load the data from @dube2019minimum, which is available in the package.

data("dube")
head(dube)

To learn more about the data, just type ?dube in the console. We have already renamed the outcome, id, and time variables to y_col, id_col, and time_col, respectively, which is required before passing the dataframe to the DiSCo command. We also need to set the two following parameters:

id_col.target <- 2
t0 <- 2003

which indicate the id of the treated unit and the time period of the intervention, respectively. We can now run the DiSCo command:

df <- copy(dube)
disco <- DiSCo(df, id_col.target, t0, G = 1000, num.cores = 1, permutation = TRUE, CI = TRUE, boots = 1000, graph = TRUE, simplex=TRUE, seed=1, q_max=0.9)

where we have chosen a grid (G) of 1000 quantiles and opted for parallel computation with 5 cores to speed up the permutation test and confidence intervals (CI is set to TRUE), which we calculate using 1000 resamples (boots). We also set the seed explicitly in the function call, which ensures reproducibility across the parallel cores. We followed the "classical" approach of @abadie2010synthetic by restricting the weights to be between 0 and 1 (simplex=TRUE). Finally, we restricted the data to the 0-90th quantile to account for the fact that the CPS data used in @dube2019minimum is imputed for the top income quantiles (more detail in the section below).

The returned disco object contains a host of information produced by the command. Typing ?DiSCoT in the console pulls up the help files which lay out the precise structure of the returned object. For now, we will just have a look at the top 10 estimated weights,

# retrieve the weights
weights <- disco$weights

# retrieve the control unit IDs 
controls <- disco$control_ids

# store in a dataframe
weights_df <- data.frame(weights = weights, fips = controls)

# merge with state fips codes (built into the maps package)
state.fips <- as.data.table(maps::state.fips)
state.fips <- state.fips[!duplicated(state.fips$abb), c("fips", "abb")]
weights_df <- merge(weights_df, state.fips, by = "fips")

setorder(weights_df, -weights)

print(weights_df[1:10,])

When we ran the DiSCo command, we set permutation to TRUE, which runs the permutation test described in the paper (see ?DiSCo_per for more details). This will allow us to inspect the permutation inference results below. Already, by setting graph to TRUE, the function displayed a plot of the full distribution of permutation tests. The black solid line shows the fit of the "true" synthetic control. The fact that it does not diverge stronger than the other lines in gray after treatment suggests that the treatment had no effect.

For context, the y-axis is the squared Wasserstein distance between the counterfactual and observed quantile functions, $$ d_{t t}^2:=\int_0^1\left|F_{Y_{u t, N}}^{-1}(q)-F_{Y_{u t}}^{-1}(q)\right|^2 d q $$and, as in @abadie2010synthetic we take the ratio of post- to pre-intervention Wasserstein distance to account for variation in the pre-treatment fit across placebo tests, $$ r_j=\frac{R_j\left(T_0+1, T\right)}{R_j\left(1, T_0\right)} $$ and calculate the p-value for the permutation test as, $$ p=\frac{1}{J+1} \sum_{j=1}^{J+1} H\left(r_j-r_1\right), $$where $H(x)$ is the Heaviside function which is 1 if $x\geq 0$ and 0 otherwise. This p-value gives the probability of observing a placebo test with a larger ratio than the true treatment effect. It can be retrieved by calling

summary(disco$perm)

The p-value is larger than 0.05, which confirms the visual result from the plot above, while accounting for potential differences in pre-treatment fit across placebo units.

Finally, we can use the DiSCo Treatment Effect Aggregator (DiSCoTEA) function to aggregate the resulting counterfactual quantile functions into various treatment effect measures. For example, we can calculate the difference between the counterfactual and observed quantile functions and cumulative distribution functions (CDF) as follows.

discot <- DiSCoTEA(disco,  agg="quantileDiff", graph=TRUE)
summary(discot)

discot <- DiSCoTEA(disco,  agg="cdfDiff", graph=TRUE, ylim=c(-0.05, 0.05))
summary(discot)

Calling summary on the returned object prints a table summarizing the effects across the distribution. You can choose the intervals of quantiles over which it aggregates using the samples parameter. If you calculated the permutation test and confidence intervals in the DiSCo function, these are reported as well.

Setting graph to TRUE also prints a plot of the distribution differences over time You can focus on specific years using the t_plot parameter. You can use the other parameters of the DiSCOTEA function to adjust the basic appearance of the plots, or directly alter the ggplot object that is stored in the returned DiSCoT object.

Looking at the plot of the quantile and CDF differences, we can see two things: 1) the pre-treatment fit for the years 1998-2002 is decent but not perfect; 2) there do not appear to be any notable effects of the minimum wage inrease, asides from fluctuations in the function differences that are similar to those pre-treatment.

Robustness tests

The package offers various ways to further test suspected effects. For example, we can focus on a specific part of the distribution, and construct a separate synthetic control for it. Mathematically, this comes down to,

$$ \vec{\lambda}t^*=\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}} \int{\text{q_min}}^{\text{q_max}}\left|\sum_{j=2}^{J+1} \lambda_j F_{Y_{j t}}^{-1}(q)-F_{Y_{1 t}}^{-1}(q)\right|^2 d q, $$

where $\text{q_min} < \text{q_max}$ are the bounds of the quantile range we're interested in. For example, following @dube2019minimum, we can focus on the lower end of the distribution, up until observations that earn 3.5 times the poverty income threshold. This corresponds to around the 0.65th quantile in our data:

stats::ecdf(disco$results.periods$`2000`$target$quantiles)(3.5)

We can simply run the following code,

disco <- DiSCo(dube, id_col.target=id_col.target, t0=t0, G = 1000, num.cores = 1, permutation = TRUE, CI = TRUE, boots = 1000, graph = FALSE, q_min = 0, q_max=0.65, seed=1, simplex=TRUE)

One could also try to deal with irregularity in the estimated quantile functions is by using the qmethod option in the DiSCo command, which allows for the use of alternative quantile estimation methods that can account for non-smoothness and extreme values. As above, we plot the results using the DiScoTEA function:

discot <- DiSCoTEA(disco, agg="quantileDiff", graph=TRUE)
summary(discot)

discot <- DiSCoTEA(disco, agg="cdfDiff", graph=TRUE, ylim=c(-0.05,0.05))
summary(discot)

Again, we do not observe any subtantial changes in the quantile functions or CDFs. Also, the p-value of the permutation test is now larger, losing the marginal significance it had before. Overall, we cannot reject the null hypothesis of there being no effect of the treatment.

Conclusion

In this vignette, we have demonstrated the DiSCo package using the data from the empirical application in @gunsilius2023distributional. We estimated the difference in between the quantile functions of the distributional synthetic control constructed with the DiSco command and the observed quantile function. To test for the presence of treatment effects, we inspected the pre-treatment distributional fit, carried out a permutation test, and re-estimated the synthetic control for a restricted quantile range; all these robustness tests are available in the DiSCo package, together with handy graphing and aggregation tools.