betwfe | R Documentation |
Implementation of extended two-way fixed effects with a bridge penalty. Estimates overall ATT as well as CATT (cohort average treatment effects on the treated units).
betwfe(
pdata,
time_var,
unit_var,
treatment,
response,
covs = c(),
indep_counts = NA,
sig_eps_sq = NA,
sig_eps_c_sq = NA,
lambda.max = NA,
lambda.min = NA,
nlambda = 100,
q = 0.5,
verbose = FALSE,
alpha = 0.05,
add_ridge = FALSE
)
pdata |
Dataframe; the panel data set. Each row should represent an observation of a unit at a time. Should contain columns as described below. |
time_var |
Character; the name of a single column containing a variable for the time period. This column is expected to contain integer values (for example, years). Recommended encodings for dates include format YYYY, YYYYMM, or YYYYMMDD, whichever is appropriate for your data. |
unit_var |
Character; the name of a single column containing a variable for each unit. This column is expected to contain character values (i.e. the "name" of each unit). |
treatment |
Character; the name of a single column containing a variable
for the treatment dummy indicator. This column is expected to contain integer
values, and in particular, should equal 0 if the unit was untreated at that
time and 1 otherwise. Treatment should be an absorbing state; that is, if
unit |
response |
Character; the name of a single column containing the response for each unit at each time. The response must be an integer or numeric value. |
covs |
(Optional.) Character; a vector containing the names of the columns for covariates. All of these columns are expected to contain integer, numeric, or factor values, and any categorical values will be automatically encoded as binary indicators. If no covariates are provided, the treatment effect estimation will proceed, but it will only be valid under unconditional versions of the parallel trends and no anticipation assumptions. Default is c(). |
indep_counts |
(Optional.) Integer; a vector. If you have a sufficiently
large number of units, you can optionally randomly split your data set in
half (with |
sig_eps_sq |
(Optional.) Numeric; the variance of the row-level IID noise assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
sig_eps_c_sq |
(Optional.) Numeric; the variance of the unit-level IID noise (random effects) assumed to apply to each observation. See Section 2 of Faletto (2025) for details. It is best to provide this variance if it is known (for example, if you are using simulated data). If this variance is unknown, this argument can be omitted, and the variance will be estimated using the estimator from Pesaran (2015, Section 26.5.1) with ridge regression. Default is NA. |
lambda.max |
(Optional.) Numeric. A penalty parameter |
lambda.min |
(Optional.) Numeric. The smallest |
nlambda |
(Optional.) Integer. The total number of |
q |
(Optional.) Numeric; determines what |
verbose |
Logical; if TRUE, more details on the progress of the function will be printed as the function executes. Default is FALSE. |
alpha |
Numeric; function will calculate (1 - |
add_ridge |
(Optional.) Logical; if TRUE, adds a small amount of ridge regularization to the (untransformed) coefficients to stabilize estimation. Default is FALSE. |
A named list with the following elements:
att_hat |
The estimated overall average treatment effect for a randomly selected treated unit. |
att_se |
If |
catt_hats |
A named vector containing the estimated average treatment effects for each cohort. |
catt_ses |
If |
cohort_probs |
A vector of the estimated probabilities of being in each
cohort conditional on being treated, which was used in calculating |
catt_df |
A dataframe displaying the cohort names,
average treatment effects, standard errors, and |
beta_hat |
The full vector of estimated coefficients. |
treat_inds |
The indices of |
treat_int_inds |
The indices of |
sig_eps_sq |
Either the provided |
sig_eps_c_sq |
Either
the provided |
lambda.max |
Either the provided |
lambda.max_model_size |
The size of the selected model corresponding
|
lambda.min |
Either the provided |
lambda.min_model_size |
The
size of the selected model corresponding to |
lambda_star |
The value of |
lambda_star_model_size |
The size of the model that was selected. If
this value is close to |
X_ints |
The design matrix created containing all interactions, time and cohort dummies, etc. |
y |
The vector of
responses, containing |
X_final |
The design matrix after applying the change in coordinates to fit the model and also multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
y_final |
The final response after multiplying on the left by the square root inverse of the estimated covariance matrix for each unit. |
N |
The final number of units that were in the data set used for estimation (after any units may have been removed because they were treated in the first time period). |
T |
The number of time periods in the final data set. |
R |
The final number of treated cohorts that appear in the final data set. |
d |
The final number of covariates that appear in the final data set (after any covariates may have been removed because they contained missing values or all contained the same value for every unit). |
p |
The final number of columns in the full set of covariates used to estimate the model. |
Gregory Faletto
Faletto, G (2025). Fused Extended Two-Way Fixed Effects for Difference-in-Differences with Staggered Adoptions. arXiv preprint arXiv:2312.05985. https://arxiv.org/abs/2312.05985. Pesaran, M. H. . Time Series and Panel Data Econometrics. Number 9780198759980 in OUP Catalogue. Oxford University Press, 2015. URL https://ideas.repec.org/b/oxp/obooks/9780198759980.html.
set.seed(23451)
library(bacondecomp)
data(divorce)
# sig_eps_sq and sig_eps_c_sq, calculated in a separate run of `fetwfe(),
# are provided to speed up the computation of the example
res <- betwfe(
pdata = divorce[divorce$sex == 2, ],
time_var = "year",
unit_var = "st",
treatment = "changed",
covs = c("murderrate", "lnpersinc", "afdcrolls"),
response = "suiciderate_elast_jag",
sig_eps_sq = 0.1025361,
sig_eps_c_sq = 4.227651e-35,
verbose = TRUE)
# Average treatment effect on the treated units (in percentage point
# units)
100 * res$att_hat
# Conservative 95% confidence interval for ATT (in percentage point units)
low_att <- 100 * (res$att_hat - qnorm(1 - 0.05 / 2) * res$att_se)
high_att <- 100 * (res$att_hat + qnorm(1 - 0.05 / 2) * res$att_se)
c(low_att, high_att)
# Cohort average treatment effects and confidence intervals (in percentage
# point units)
catt_df_pct <- res$catt_df
catt_df_pct[["Estimated TE"]] <- 100 * catt_df_pct[["Estimated TE"]]
catt_df_pct[["SE"]] <- 100 * catt_df_pct[["SE"]]
catt_df_pct[["ConfIntLow"]] <- 100 * catt_df_pct[["ConfIntLow"]]
catt_df_pct[["ConfIntHigh"]] <- 100 * catt_df_pct[["ConfIntHigh"]]
catt_df_pct
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.