| att_gt | R Documentation |
att_gt computes average treatment effects in DID
setups where there are more than two periods of data and allowing for
treatment to occur at different points in time and allowing for
treatment effect heterogeneity and dynamics.
See Callaway and Sant'Anna (2021) for a detailed description.
att_gt(
yname,
tname,
idname = NULL,
gname,
xformla = NULL,
data,
panel = TRUE,
allow_unbalanced_panel = FALSE,
control_group = c("nevertreated", "notyettreated"),
anticipation = 0,
weightsname = NULL,
fix_weights = NULL,
alp = 0.05,
bstrap = TRUE,
cband = TRUE,
biters = 1000,
clustervars = NULL,
est_method = "dr",
base_period = "varying",
faster_mode = TRUE,
print_details = FALSE,
pl = FALSE,
cores = 1,
compute_inffunc = TRUE,
...
)
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form For time-varying covariates: (1) With balanced panel data,
in each 2x2 comparison, the covariates
are taken to be the value of the covariates in the earlier time
period, and all of the underlying computations involve changes in Y
as a function of those values of covariates. (2) With repeated cross
sections data and unbalanced panel data, the covariates are taken
from each time period and computations involve Y_post conditional
on X_post minus Y_pre conditional on X_pre. A byproduct of this
is that, with balanced panel data and in the presence of
time-varying covariates, it is possible to get different numerical
results according to whether or not |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
value is |
control_group |
Which units to use as the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights.
If not set, all observations have same weight. When weights are
time-invariant (constant within each unit across periods), all
When weights vary across time (e.g., time-varying population sizes), the default behavior differs by panel type:
Use the |
fix_weights |
Controls how time-varying sampling weights are resolved. Only relevant when weights vary across time; with time-invariant weights, all options produce identical results. Options:
|
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. Default is |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. Clustered standard errors are available with the multiplier bootstrap
( |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the Panel data ( Repeated cross sections / unbalanced panel ( In both cases the function should return a list that includes
|
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
faster_mode |
This option enables a faster version of |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
compute_inffunc |
Whether or not to compute the influence functions. The
default is |
... |
Additional arguments to be passed to a custom |
an MP object containing all the results for group-time average
treatment effects
The returned inffunc matrix collects the estimated influence functions,
with one column per ATT(g,t) and one row per cross-sectional unit (one row
per observation with repeated cross sections). Its rownames hold the unit
ids (idname; an internal observation index for repeated cross sections)
and are the authoritative link between rows and units: the row ORDER is
mode-specific (faster_mode = FALSE sorts units by id, while
faster_mode = TRUE uses an internal (period, cohort, id) ordering), so
external consumers of the influence functions must align rows by rowname,
never by position.
When faster_mode = FALSE, setting options(did.disable_precompute = TRUE)
disables the one-time positional precompute and assembles every 2x2
comparison from the long data per cell, as in earlier versions. The
covariate design matrix is still built once over the full sample under
both settings, so results are identical either way; the option exists
only as a debugging escape hatch.
Basic att_gt() call:
# Example data
data(mpdta)
set.seed(09152024)
out1 <- att_gt(yname="lemp",
tname="year",
idname="countyreal",
gname="first.treat",
xformla=NULL,
data=mpdta)
summary(out1)
#>
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal",
#> gname = "first.treat", xformla = NULL, data = mpdta)
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2004 2004 -0.0105 0.0258 -0.0809 0.0599
#> 2004 2005 -0.0704 0.0341 -0.1635 0.0227
#> 2004 2006 -0.1373 0.0384 -0.2423 -0.0322 *
#> 2004 2007 -0.1008 0.0354 -0.1976 -0.0040 *
#> 2006 2004 0.0065 0.0235 -0.0578 0.0708
#> 2006 2005 -0.0028 0.0192 -0.0554 0.0499
#> 2006 2006 -0.0046 0.0184 -0.0548 0.0456
#> 2006 2007 -0.0412 0.0207 -0.0977 0.0153
#> 2007 2004 0.0305 0.0161 -0.0135 0.0746
#> 2007 2005 -0.0027 0.0157 -0.0456 0.0401
#> 2007 2006 -0.0311 0.0184 -0.0815 0.0193
#> 2007 2007 -0.0261 0.0176 -0.0741 0.0220
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.16812
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
Using covariates:
out2 <- att_gt(yname="lemp",
tname="year",
idname="countyreal",
gname="first.treat",
xformla=~lpop,
data=mpdta)
summary(out2)
#>
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal",
#> gname = "first.treat", xformla = ~lpop, data = mpdta)
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2004 2004 -0.0145 0.0222 -0.0737 0.0446
#> 2004 2005 -0.0764 0.0303 -0.1570 0.0041
#> 2004 2006 -0.1404 0.0382 -0.2421 -0.0388 *
#> 2004 2007 -0.1069 0.0358 -0.2021 -0.0117 *
#> 2006 2004 -0.0005 0.0231 -0.0618 0.0609
#> 2006 2005 -0.0062 0.0188 -0.0561 0.0437
#> 2006 2006 0.0010 0.0204 -0.0534 0.0553
#> 2006 2007 -0.0413 0.0210 -0.0971 0.0145
#> 2007 2004 0.0267 0.0140 -0.0104 0.0639
#> 2007 2005 -0.0046 0.0170 -0.0498 0.0407
#> 2007 2006 -0.0284 0.0187 -0.0782 0.0213
#> 2007 2007 -0.0288 0.0161 -0.0715 0.0140
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.23267
#> Control Group: Never Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
Specify comparison units:
out3 <- att_gt(yname="lemp",
tname="year",
idname="countyreal",
gname="first.treat",
xformla=~lpop,
control_group = "notyettreated",
data=mpdta)
summary(out3)
#>
#> Call:
#> att_gt(yname = "lemp", tname = "year", idname = "countyreal",
#> gname = "first.treat", xformla = ~lpop, data = mpdta, control_group = "notyettreated")
#>
#> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015>
#>
#> Group-Time Average Treatment Effects:
#> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band]
#> 2004 2004 -0.0212 0.0217 -0.0788 0.0365
#> 2004 2005 -0.0816 0.0324 -0.1676 0.0044
#> 2004 2006 -0.1382 0.0368 -0.2361 -0.0403 *
#> 2004 2007 -0.1069 0.0344 -0.1984 -0.0154 *
#> 2006 2004 -0.0075 0.0233 -0.0693 0.0544
#> 2006 2005 -0.0046 0.0184 -0.0533 0.0442
#> 2006 2006 0.0087 0.0182 -0.0397 0.0570
#> 2006 2007 -0.0413 0.0205 -0.0956 0.0130
#> 2007 2004 0.0269 0.0136 -0.0091 0.0630
#> 2007 2005 -0.0042 0.0153 -0.0448 0.0364
#> 2007 2006 -0.0284 0.0191 -0.0792 0.0223
#> 2007 2007 -0.0288 0.0167 -0.0732 0.0157
#> ---
#> Signif. codes: `*' confidence band does not cover 0
#>
#> P-value for pre-test of parallel trends assumption: 0.23326
#> Control Group: Not Yet Treated, Anticipation Periods: 0
#> Estimation Method: Doubly Robust
Callaway, Brantly and Pedro H.C. Sant'Anna. \"Difference-in-Differences with Multiple Time Periods.\" Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jeconom.2020.12.001")}, https://arxiv.org/abs/1803.09015
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.