#' @title Group-Time Average Treatment Effects
#'
#' @description `att_gt` computes average treatment effects in DID
#' setups where there are more than two periods of data and allowing for
#' treatment to occur at different points in time and allowing for
#' treatment effect heterogeneity and dynamics.
#' See Callaway and Sant'Anna (2021) for a detailed description.
#'
#' @param yname The name of the outcome variable
#' @param data The name of the data.frame that contains the data
#' @param tname The name of the column containing the time periods
#' @param idname The individual (cross-sectional unit) id name
#' @param gname The name of the variable in `data` that
#' contains the first period when a particular observation is treated.
#' This should be a positive number for all observations in treated groups.
#' It defines which "group" a unit belongs to. It should be 0 for units
#' in the untreated group.
#' @param weightsname The name of the column containing the sampling weights.
#' If not set, all observations have same weight.
#' @param alp the significance level, default is 0.05
#' @param bstrap Boolean for whether or not to compute standard errors using
#' the multiplier bootstrap. If standard errors are clustered, then one
#' must set `bstrap=TRUE`. Default is `TRUE` (in addition, cband
#' is also by default `TRUE` indicating that uniform confidence bands
#' will be returned. If bstrap is `FALSE`, then analytical
#' standard errors are reported.
#' @param biters The number of bootstrap iterations to use. The default is 1000,
#' and this is only applicable if `bstrap=TRUE`.
#' @param clustervars A vector of variables names to cluster on. At most, there
#' can be two variables (otherwise will throw an error) and one of these
#' must be the same as idname which allows for clustering at the individual
#' level. By default, we cluster at individual level (when `bstrap=TRUE`).
#' @param cband Boolean for whether or not to compute a uniform confidence
#' band that covers all of the group-time average treatment effects
#' with fixed probability `1-alp`. In order to compute uniform confidence
#' bands, `bstrap` must also be set to `TRUE`. The default is
#' `TRUE`.
#' @param print_details Whether or not to show details/progress of computations.
#' Default is `FALSE`.
#' @param pl Whether or not to use parallel processing
#' @param cores The number of cores to use for parallel processing
#' @param est_method the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
#' approach in the `DRDID` package. Other built-in methods
#' include "ipw" for inverse probability weighting and "reg" for
#' first step regression estimators. The user can also pass their
#' own function for estimating group time average treatment
#' effects. This should be a function
#' `f(Y1,Y0,treat,covariates)` where `Y1` is an
#' `n` x `1` vector of outcomes in the post-treatment
#' outcomes, `Y0` is an `n` x `1` vector of
#' pre-treatment outcomes, `treat` is a vector indicating
#' whether or not an individual participates in the treatment,
#' and `covariates` is an `n` x `k` matrix of
#' covariates. The function should return a list that includes
#' `ATT` (an estimated average treatment effect), and
#' `inf.func` (an `n` x `1` influence function).
#' The function can return other things as well, but these are
#' the only two that are required. `est_method` is only used
#' if covariates are included.
#' @param xformla A formula for the covariates to include in the
#' model. It should be of the form `~ X1 + X2`. Default
#' is NULL which is equivalent to `xformla=~1`. This is
#' used to create a matrix of covariates which is then passed
#' to the 2x2 DID estimator chosen in `est_method`.
#' @param panel Whether or not the data is a panel dataset.
#' The panel dataset should be provided in long format -- that
#' is, where each row corresponds to a unit observed at a
#' particular point in time. The default is TRUE. When
#' is using a panel dataset, the variable `idname` must
#' be set. When `panel=FALSE`, the data is treated
#' as repeated cross sections.
#' @param allow_unbalanced_panel Whether or not function should
#' "balance" the panel with respect to time and id. The default
#' values if `FALSE` which means that [att_gt()] will drop
#' all units where data is not observed in all periods.
#' The advantage of this is that the computations are faster
#' (sometimes substantially).
#' @param control_group Which units to use the control group.
#' The default is "nevertreated" which sets the control group
#' to be the group of units that never participate in the
#' treatment. This group does not change across groups or
#' time periods. The other option is to set
#' `group="notyettreated"`. In this case, the control group
#' is set to the group of units that have not yet participated
#' in the treatment in that time period. This includes all
#' never treated units, but it includes additional units that
#' eventually participate in the treatment, but have not
#' participated yet.
#' @param anticipation The number of time periods before participating
#' in the treatment where units can anticipate participating in the
#' treatment and therefore it can affect their untreated potential outcomes
#' @param faster_mode This option enables a faster version of `did`, optimizing
#' computation time for large datasets by improving data management within the package.
#' The default is set to `FALSE`. While the difference is minimal for small datasets,
#' it is recommended for use with large datasets.
#' @param base_period Whether to use a "varying" base period or a
#' "universal" base period. Either choice results in the same
#' post-treatment estimates of ATT(g,t)'s. In pre-treatment
#' periods, using a varying base period amounts to computing a
#' pseudo-ATT in each treatment period by comparing the change
#' in outcomes for a particular group relative to its comparison
#' group in the pre-treatment periods (i.e., in pre-treatment
#' periods this setting computes changes from period t-1 to period
#' t, but repeatedly changes the value of t)
#'
#' A universal base period fixes the base period to always be
#' (g-anticipation-1). This does not compute
#' pseudo-ATT(g,t)'s in pre-treatment periods, but rather
#' reports average changes in outcomes from period t to
#' (g-anticipation-1) for a particular group relative to its comparison
#' group. This is analogous to what is often reported in event
#' study regressions.
#'
#' Using a varying base period results in an estimate of
#' ATT(g,t) being reported in the period immediately before
#' treatment. Using a universal base period normalizes the
#' estimate in the period right before treatment (or earlier when
#' the user allows for anticipation) to be equal to 0, but one
#' extra estimate in an earlier period.
#'
#' @references Callaway, Brantly and Pedro H.C. Sant'Anna. \"Difference-in-Differences with Multiple Time Periods.\" Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. \doi{10.1016/j.jeconom.2020.12.001}, <https://arxiv.org/abs/1803.09015>
#'
#' @return an [`MP`] object containing all the results for group-time average
#' treatment effects
#'
#' @details # Examples:
#'
#' **Basic [att_gt()] call:**
#' ```{r, comment = "#>", collapse = TRUE}
#' # Example data
#' data(mpdta)
#' set.seed(09152024)
#' out1 <- att_gt(yname="lemp",
#' tname="year",
#' idname="countyreal",
#' gname="first.treat",
#' xformla=NULL,
#' data=mpdta)
#' summary(out1)
#' ```
#'
#' **Using covariates:**
#'
#' ```{r, comment = "#>", collapse = TRUE}
#' out2 <- att_gt(yname="lemp",
#' tname="year",
#' idname="countyreal",
#' gname="first.treat",
#' xformla=~lpop,
#' data=mpdta)
#' summary(out2)
#' ```
#'
#' **Specify comparison units:**
#'
#' ```{r, comment = "#>", collapse = TRUE}
#' out3 <- att_gt(yname="lemp",
#' tname="year",
#' idname="countyreal",
#' gname="first.treat",
#' xformla=~lpop,
#' control_group = "notyettreated",
#' data=mpdta)
#' summary(out3)
#' ```
#'
#' @export
att_gt <- function(yname,
tname,
idname=NULL,
gname,
xformla=NULL,
data,
panel=TRUE,
allow_unbalanced_panel=FALSE,
control_group=c("nevertreated","notyettreated"),
anticipation=0,
weightsname=NULL,
alp=0.05,
bstrap=TRUE,
cband=TRUE,
biters=1000,
clustervars=NULL,
est_method="dr",
base_period="varying",
faster_mode=FALSE,
print_details=FALSE,
pl=FALSE,
cores=1) {
# Check if user wants to run faster mode:
if (faster_mode) {
# this is a DIDparams2 object
dp <- pre_process_did2(yname=yname,
tname=tname,
idname=idname,
gname=gname,
xformla=xformla,
data=data,
panel=panel,
allow_unbalanced_panel=allow_unbalanced_panel,
control_group=control_group,
anticipation=anticipation,
weightsname=weightsname,
alp=alp,
bstrap=bstrap,
cband=cband,
biters=biters,
clustervars=clustervars,
est_method=est_method,
base_period=base_period,
print_details=print_details,
faster_mode=faster_mode,
pl=pl,
cores=cores,
call=match.call()
)
#-----------------------------------------------------------------------------
# Compute all ATT(g,t)
#-----------------------------------------------------------------------------
results <- compute.att_gt2(dp)
} else {
# this is a DIDparams object
dp <- pre_process_did(yname=yname,
tname=tname,
idname=idname,
gname=gname,
xformla=xformla,
data=data,
panel=panel,
allow_unbalanced_panel=allow_unbalanced_panel,
control_group=control_group,
anticipation=anticipation,
weightsname=weightsname,
alp=alp,
bstrap=bstrap,
cband=cband,
biters=biters,
clustervars=clustervars,
est_method=est_method,
base_period=base_period,
print_details=print_details,
pl=pl,
cores=cores,
call=match.call()
)
#-----------------------------------------------------------------------------
# Compute all ATT(g,t)
#-----------------------------------------------------------------------------
results <- compute.att_gt(dp)
}
# extract ATT(g,t) and influence functions
attgt.list <- results$attgt.list
inffunc <- results$inffunc
# process results
# attgt.results <- process_attgt(attgt.list)
tryCatch(
{
# Attempt to run this line for process results
attgt.results <- process_attgt(attgt.list)
},
error = function(e) {
# Handle the error
if (faster_mode) {
# If faster_mode is TRUE, send this stop message
stop("An unexpected error occurred, normally associated with a singular matrix due to not enough control units. Try changing faster_mode=FALSE.")
} else {
# If faster_mode is FALSE, send this stop message
stop("An unexpected error occurred, normally associated with a singular matrix due to not enough control units.")
}
}
)
group <- attgt.results$group
att <- attgt.results$att
tt <- attgt.results$tt
# analytical standard errors
# estimate variance
# this is analogous to cluster robust standard errors that
# are clustered at the unit level
# note to self: this def. won't work with unbalanced panel,
# same with clustered standard errors
# but it is always ignored b/c bstrap has to be true in that case
n <- ifelse(faster_mode, dp$id_count, dp$n)
V <- Matrix::t(inffunc)%*%inffunc/n
se <- sqrt(Matrix::diag(V)/n)
# Zero standard error replaced by NA
se[se <= sqrt(.Machine$double.eps)*10] <- NA
# if clustering along another dimension...we require using the
# bootstrap (in principle, could come up with analytical standard
# errors here though)
if ( (length(clustervars) > 0) & !bstrap) {
warning("clustering the standard errors requires using the bootstrap, resulting standard errors are NOT accounting for clustering")
}
# Identify entries of main diagonal V that are zero or NA
zero_na_sd_entry <- unique(which(is.na(se)))
# bootstrap variance matrix
if (bstrap) {
bout <- mboot(inffunc, DIDparams=dp, pl=pl, cores=cores)
bres <- bout$bres
if(length(zero_na_sd_entry)>0) {
se[-zero_na_sd_entry] <- bout$se[-zero_na_sd_entry]
} else {
se <- bout$se
}
}
# Zero standard error replaced by NA
se[se <= sqrt(.Machine$double.eps)*10] <- NA
#-----------------------------------------------------------------------------
# compute Wald pre-test
#-----------------------------------------------------------------------------
# select which periods are pre-treatment
pre <- which(group > tt)
# Drop group-periods that have variance equal to zero (singularity problems)
if(length(zero_na_sd_entry)>0){
pre <- pre[!(pre %in% zero_na_sd_entry)]
}
# pseudo-atts in pre-treatment periods
preatt <- as.matrix(att[pre])
# covariance matrix of pre-treatment atts
preV <- as.matrix(V[pre,pre])
# check if there are actually any pre-treatment periods
if (length(preV) == 0) {
message("No pre-treatment periods to test")
W <- NULL
Wpval <- NULL
} else if(sum(is.na(preV))) {
warning("Not returning pre-test Wald statistic due to NA pre-treatment values")
W <- NULL
Wpval <- NULL
} else if (rcond(preV) <= .Machine$double.eps) {
# singluar covariance matrix for pre-treatment periods
warning("Not returning pre-test Wald statistic due to singular covariance matrix")
W <- NULL
Wpval <- NULL
} else {
# everything is working...
W <- n*t(preatt)%*%solve(preV)%*%preatt
q <- length(pre) # number of restrictions
Wpval <- round(1-pchisq(W,q),5)
}
#-----------------------------------------------------------------------------
# compute confidence intervals / bands
#-----------------------------------------------------------------------------
# critical value from N(0,1), for pointwise
cval <- qnorm(1-alp/2)
# in order to get uniform confidence bands
# HAVE to use the bootstrap
if (bstrap){
if (cband) {
# for uniform confidence band
# compute new critical value
# see paper for details
bSigma <- apply(bres, 2,
function(b) (quantile(b, .75, type=1, na.rm = T) -
quantile(b, .25, type=1, na.rm = T))/(qnorm(.75) - qnorm(.25)))
bSigma[bSigma <= sqrt(.Machine$double.eps)*10] <- NA
# sup-t confidence band
bT <- apply(bres, 1, function(b) max( abs(b/bSigma), na.rm = TRUE))
cval <- quantile(bT, 1-alp, type=1, na.rm = T)
if(cval >= 7){
warning("Simultaneous critical value is arguably `too large' to be realible. This usually happens when number of observations per group is small and/or there is no much variation in outcomes.")
}
}
}
# Return this list
return(MP(group=group, t=tt, att=att, V_analytical=V, se=se, c=cval, inffunc=inffunc, n=n, W=W, Wpval=Wpval, alp = alp, DIDparams=dp))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.