knitr::opts_chunk$set( comment = "#", echo=FALSE, error = FALSE, tidy = FALSE, cache = FALSE, collapse = TRUE, eval=FALSE, # dont rerun vignette when building package out.width = '100%', dpi = 144 )
library(pracma) library(parallel) library(stats) library(maps) library(data.table) library(CVXR) library(DiSCos) library(ggplot2)
This vignette demonstrates how to use the DiSCo package by way of the
empirical application in @gunsilius2023distributional, which is based on
@dube2019minimum. We illustrate the use of the two main functions: 1)
DiSCo
, which estimates the raw distributional counterfactuals,
computes confidence intervals, and optionally performs a permutation
test; and 2) DiSCoTEA
, the "Treatment Effect Aggregator", which takes
in the distributional counterfactuals and computes aggregate treatment
effects using a user-specified aggregation statistic.
We briefly review the main idea behind Distributional Synthetic
Controls. Denote $Y_{jt,N}$ the outcome of group $j$ in time period $t$
in the absence of an intervention. Also denote $Y_{jt,I}$ the outcome in
the presence of an intervention at time $t > T_0$. Denote the quantile
function, $$
F^{-1}(q):=\inf {y \in \mathbb{R}}{F(y) \geq q}, \quad q \in(0,1),
$$ where $F(y)$ is the corresponding cumulative distribution function.
One unit $j=1$ has received treatment, while the other units
$j=2,\ldots,J$ have not. Then the goal is to estimate the counterfactual
quantile function $F{Y_{1 t}, N}^{-1}$ of the treated unit had it not
received treatment by an optimally weighted average of the control
units' quantile functions, $$
F_{Y_{1 t}, N}^{-1}(q)=\sum_{j=2}^{J+1} \lambda_j^ F_{Y_{j t}}^{-1}(q) \quad \text { for all } q \in(0,1)
$$\
In practice, we do this by solving the following optimization problem:
$$
\vec{\lambda}_t^=\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}} \int_0^1\left|\sum_{j=2}^{J+1} \lambda_j F_{Y_{j t}}^{-1}(q)-F_{Y_{1 t}}^{-1}(q)\right|^2 d q
$$which gives an optimal weight for each unit-period combination. This
problem can be solved by a simple weighted regression, which is
implemented in the DiSCo_weights_reg
function. To obtain the overall
optimal weights $\vec{\lambda}^*$, we take the average of the optimal
weights across all periods.
To get coding, we load the data from @dube2019minimum, which is available in the package.
data("dube") head(dube)
To learn more about the data, just type ?dube
in the console. We have
already renamed the outcome, id, and time variables to y_col
,
id_col
, and time_col
, respectively, which is required before passing
the dataframe to the DiSCo command. We also need to set the two
following parameters:
id_col.target <- 2 t0 <- 2003
which indicate the id of the treated unit and the time period of the intervention, respectively. We can now run the DiSCo command:
df <- copy(dube) disco <- DiSCo(df, id_col.target, t0, G = 1000, num.cores = 1, permutation = TRUE, CI = TRUE, boots = 1000, graph = TRUE, simplex=TRUE, seed=1, q_max=0.9)
where we have chosen a grid (G
) of 1000 quantiles and opted for
parallel computation with 5 cores to speed up the permutation test and
confidence intervals (CI
is set to TRUE
), which we calculate using
1000 resamples (boots
). We also set the seed
explicitly in the
function call, which ensures reproducibility across the parallel cores.
We followed the "classical" approach of @abadie2010synthetic by
restricting the weights to be between 0 and 1 (simplex=TRUE
). Finally,
we restricted the data to the 0-90th quantile to account for the fact
that the CPS data used in @dube2019minimum is imputed for the top income
quantiles (more detail in the section below).
The returned disco
object contains a host of information produced by
the command. Typing ?DiSCoT
in the console pulls up the help files
which lay out the precise structure of the returned object. For now, we
will just have a look at the top 10 estimated weights,
# retrieve the weights weights <- disco$weights # retrieve the control unit IDs controls <- disco$control_ids # store in a dataframe weights_df <- data.frame(weights = weights, fips = controls) # merge with state fips codes (built into the maps package) state.fips <- as.data.table(maps::state.fips) state.fips <- state.fips[!duplicated(state.fips$abb), c("fips", "abb")] weights_df <- merge(weights_df, state.fips, by = "fips") setorder(weights_df, -weights) print(weights_df[1:10,])
When we ran the DiSCo
command, we set permutation
to TRUE
, which
runs the permutation test described in the paper (see ?DiSCo_per
for
more details). This will allow us to inspect the permutation inference
results below. Already, by setting graph
to TRUE
, the function
displayed a plot of the full distribution of permutation tests. The
black solid line shows the fit of the "true" synthetic control. The fact
that it does not diverge stronger than the other lines in gray after
treatment suggests that the treatment had no effect.
For context, the y-axis is the squared Wasserstein distance between the counterfactual and observed quantile functions, $$ d_{t t}^2:=\int_0^1\left|F_{Y_{u t, N}}^{-1}(q)-F_{Y_{u t}}^{-1}(q)\right|^2 d q $$and, as in @abadie2010synthetic we take the ratio of post- to pre-intervention Wasserstein distance to account for variation in the pre-treatment fit across placebo tests, $$ r_j=\frac{R_j\left(T_0+1, T\right)}{R_j\left(1, T_0\right)} $$ and calculate the p-value for the permutation test as, $$ p=\frac{1}{J+1} \sum_{j=1}^{J+1} H\left(r_j-r_1\right), $$where $H(x)$ is the Heaviside function which is 1 if $x\geq 0$ and 0 otherwise. This p-value gives the probability of observing a placebo test with a larger ratio than the true treatment effect. It can be retrieved by calling
summary(disco$perm)
The p-value is larger than 0.05, which confirms the visual result from the plot above, while accounting for potential differences in pre-treatment fit across placebo units.
Finally, we can use the DiSCo Treatment Effect Aggregator (DiSCoTEA
)
function to aggregate the resulting counterfactual quantile functions
into various treatment effect measures. For example, we can calculate
the difference between the counterfactual and observed quantile
functions and cumulative distribution functions (CDF) as follows.
discot <- DiSCoTEA(disco, agg="quantileDiff", graph=TRUE) summary(discot)
discot <- DiSCoTEA(disco, agg="cdfDiff", graph=TRUE, ylim=c(-0.05, 0.05)) summary(discot)
Calling summary
on the returned object prints a table summarizing the
effects across the distribution. You can choose the intervals of
quantiles over which it aggregates using the samples
parameter. If you
calculated the permutation test and confidence intervals in the DiSCo
function, these are reported as well.
Setting graph
to TRUE
also prints a plot of the distribution
differences over time You can focus on specific years using the t_plot
parameter. You can use the other parameters of the DiSCOTEA
function
to adjust the basic appearance of the plots, or directly alter the
ggplot object that is stored in the returned DiSCoT
object.
Looking at the plot of the quantile and CDF differences, we can see two things: 1) the pre-treatment fit for the years 1998-2002 is decent but not perfect; 2) there do not appear to be any notable effects of the minimum wage inrease, asides from fluctuations in the function differences that are similar to those pre-treatment.
The package offers various ways to further test suspected effects. For example, we can focus on a specific part of the distribution, and construct a separate synthetic control for it. Mathematically, this comes down to,
$$ \vec{\lambda}t^*=\underset{\vec{\lambda} \in \Delta^J}{\operatorname{argmin}} \int{\text{q_min}}^{\text{q_max}}\left|\sum_{j=2}^{J+1} \lambda_j F_{Y_{j t}}^{-1}(q)-F_{Y_{1 t}}^{-1}(q)\right|^2 d q, $$
where $\text{q_min} < \text{q_max}$ are the bounds of the quantile range we're interested in. For example, following @dube2019minimum, we can focus on the lower end of the distribution, up until observations that earn 3.5 times the poverty income threshold. This corresponds to around the 0.65th quantile in our data:
stats::ecdf(disco$results.periods$`2000`$target$quantiles)(3.5)
We can simply run the following code,
disco <- DiSCo(dube, id_col.target=id_col.target, t0=t0, G = 1000, num.cores = 1, permutation = TRUE, CI = TRUE, boots = 1000, graph = FALSE, q_min = 0, q_max=0.65, seed=1, simplex=TRUE)
One could also try to deal with irregularity in the estimated quantile
functions is by using the qmethod
option in the DiSCo
command, which
allows for the use of alternative quantile estimation methods that can
account for non-smoothness and extreme values. As above, we plot the
results using the DiScoTEA
function:
discot <- DiSCoTEA(disco, agg="quantileDiff", graph=TRUE) summary(discot)
discot <- DiSCoTEA(disco, agg="cdfDiff", graph=TRUE, ylim=c(-0.05,0.05)) summary(discot)
Again, we do not observe any subtantial changes in the quantile functions or CDFs. Also, the p-value of the permutation test is now larger, losing the marginal significance it had before. Overall, we cannot reject the null hypothesis of there being no effect of the treatment.
In this vignette, we have demonstrated the DiSCo package using the data
from the empirical application in @gunsilius2023distributional. We
estimated the difference in between the quantile functions of the
distributional synthetic control constructed with the DiSco
command
and the observed quantile function. To test for the presence of
treatment effects, we inspected the pre-treatment distributional fit,
carried out a permutation test, and re-estimated the synthetic control
for a restricted quantile range; all these robustness tests are
available in the DiSCo package, together with handy graphing and
aggregation tools.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.