Note By default, the
fixes
package assumes time is a regularly spaced numeric variable (e.g., year = 1995, 1996, …). However, if your time variable is irregular or non-numeric (e.g.,Date
type), you can enabletime_transform = TRUE
to automatically convert it to a sequential index within each unit. You can also specify unit-specific treatment timing by settingstaggered = TRUE
.
The fixes
package is designed for conducting analysis and creating
plots for event studies, a method used to verify the parallel trends
assumption in two-way fixed effects (TWFE) difference-in-differences
(DID) analysis.
The package includes two main functions:
run_es()
: Accepts a data frame, generates lead and lag variables,
and performs event study analysis. The function returns the results
as a tidy data frame. Supports options for fixed effects,
covariates, clustered standard errors, and staggered treatment
timing.plot_es()
: Creates plots using ggplot2
based on the data frame
generated by run_es()
. Users can choose between a plot with
geom_ribbon()
or geom_errorbar()
to visualize the results.You can install the package like so:
# install.packages("pak")
pak::pak("fixes")
or
install.packages("fixes")
If you want to install development version, please install from GitHub repository:
pak::pak("yo5uke/fixes")
First, load the library.
library(fixes)
The run_es()
function is designed to work with panel data.
The data frame must include the following variables:
Date
)In addition, if you use staggered = TRUE
, you must provide a variable
that indicates unit-specific treatment timing (e.g., the year
treatment started for each unit).
To get started, you can use example data from widely used packages:
did::sim_dt()
: A simulated panel dataset commonly used in
difference-in-differences tutorials.fixest::base_stagg
: A built-in dataset designed for analyzing
staggered adoption of treatment.These datasets already contain the necessary structure and can be used
directly with run_es()
.
# Load example data
df1 <- fixest::base_did # Basic DID example
df2 <- fixest::base_stagg # Staggered treatment example
| y | x1 | id | period | post | treat | |-----------:|-----------:|----:|-------:|-----:|------:| | 2.8753063 | 0.5365377 | 1 | 1 | 0 | 1 | | 1.8606527 | -3.0431894 | 1 | 2 | 0 | 1 | | 0.0941652 | 5.5768439 | 1 | 3 | 0 | 1 | | 3.7814749 | -2.8300587 | 1 | 4 | 0 | 1 | | -2.5581996 | -5.0443544 | 1 | 5 | 0 | 1 | | 1.7287324 | -0.6363849 | 1 | 6 | 1 | 1 |
| | id | year | year_treated | time_to_treatment | treated | treatment_effect_true | x1 | y | |:---|---:|---:|---:|---:|---:|---:|---:|---:| | 2 | 90 | 1 | 2 | -1 | 1 | 0 | -1.0947021 | 0.0172297 | | 3 | 89 | 1 | 3 | -2 | 1 | 0 | -3.7100676 | -4.5808453 | | 4 | 88 | 1 | 4 | -3 | 1 | 0 | 2.5274402 | 2.7381717 | | 5 | 87 | 1 | 5 | -4 | 1 | 0 | -0.7204263 | -0.6510307 | | 6 | 86 | 1 | 6 | -5 | 1 | 0 | -3.6711678 | -5.3338166 | | 7 | 85 | 1 | 7 | -6 | 1 | 0 | -0.3152137 | 0.4956263 |
run_es()
run_es()
takes 14 arguments, including required variables and optional
specifications like fixed effects, clustering, covariates, staggered
treatment timing, and weights.
| Argument | Description |
|----|----|
| data
| Data frame to be used. |
| outcome
| Outcome variable. Can be specified as a raw variable or a transformation (e.g., log(y)
). Provide it unquoted. |
| treatment
| Dummy variable indicating the treated units. Provide it unquoted. Accepts both 0/1
and TRUE/FALSE
. |
| time
| Time variable. Provide it unquoted. |
| staggered
| Logical. If TRUE
, allows for unit-specific treatment timing (staggered adoption). Default is FALSE
. |
| timing
| The time at which the treatment occurs. If staggered = FALSE
, this should be a scalar (e.g., 2005
). If staggered = TRUE
, provide a variable (column) indicating the treatment time for each unit. |
| lead_range
| Number of pre-treatment periods to include (e.g., 3 = lead3
, lead2
, lead1
). Default is NULL
, which automatically uses the maximum available lead range. Set to a number to restrict the range manually. |
| lag_range
| Number of post-treatment periods to include (e.g., 2 = lag0
(the treatment period), lag1
, lag2
). Default is NULL
, which automatically uses the maximum available lag range. Set to a number to restrict the range manually. |
| covariates
| Additional covariates to include in the regression. Must be a one-sided formula (e.g., ~ x1 + x2
). |
| fe
| Fixed effects to control for unobserved heterogeneity. Must be a one-sided formula (e.g., ~ id + year
). |
| cluster
| Specifies clustering for standard errors. Can be a character vector (e.g., c("id", "year")
) or a formula (e.g., ~ id + year
, ~ id^year
). |
| weights
| Optional weights to be used in the regression. Provide as a one-sided formula (e.g., ~ weight
). |
| baseline
| Relative time value to be used as the reference category. The corresponding dummy is excluded from the regression. Must be within the specified lead/lag range. |
| interval
| Time interval between observations (e.g., 1
for yearly data, 5
for 5-year intervals). |
| time_transform
| Logical. If TRUE
, converts the time
variable into a sequential index (1, 2, 3, …) within each unit. Useful when time is irregular, such as with Date
values or unbalanced panels (e.g., missing years or monthly observations). Default is FALSE
. |
| unit
| Required if time_transform = TRUE
. Specifies the panel unit identifier (e.g., firm_id
). |
event_study <- run_es(
data = df1,
outcome = y,
treatment = treat,
time = period,
timing = 6,
lead_range = 5,
lag_range = 4,
fe = ~ id + period,
cluster = ~ id,
baseline = -1,
interval = 1
)
Note: The fe
argument must be specified as a one-sided formula
(e.g., ~ firm_id + year
).
The cluster
argument can be specified either as a one-sided formula
(e.g., ~ state_id
) or as a character vector (e.g.,
c("firm_id", "year")
).
The run_es()
function returns a tidy data frame that includes
estimated event-study coefficients, confidence intervals, relative
timing values, and an indicator for the omitted baseline period.
Estimation is performed using fast and flexible fixed effects
regression.
If your dataset includes additional covariates, you can include them in
the regression by specifying a one-sided formula using the covariates
argument, as shown below.
event_study <- run_es(
data = df1,
outcome = y,
treatment = treat,
time = period,
timing = 6,
lead_range = 5,
lag_range = 4,
covariates = ~ cov1 + cov2 + cov3,
fe = ~ id + period,
cluster = ~ id,
baseline = -1,
interval = 1
)
# Example using Date-type time variable and time_transform
df_alt <- df1 |>
dplyr::mutate(
year = rep(2001:2010, times = 108), # 108 units × 10 periods
date = as.Date(paste0(year, "-01-01"))
)
event_study_alt <- run_es(
data = df_alt,
outcome = y,
treatment = treat,
time = date,
timing = 19, # Corresponds to 19th time point in each unit
lead_range = 3,
lag_range = 3,
fe = ~ id + period,
cluster = ~ id,
baseline = -1,
time_transform = TRUE,
unit = id
)
Note: When
time_transform = TRUE
, thetiming
argument must be specified using the transformed index (e.g.,timing = 19
for the 19th time point within each unit). Support for specifying the original time values (e.g., a specificDate
) directly astiming
is planned for a future update. Currently,time_transform = TRUE
cannot be combined withstaggered = TRUE
. This combination is not yet supported, but may be implemented in a future release.
You can use this result to create custom plots, or take advantage of the
built-in plot_es()
function to visualize the estimates and confidence
intervals with minimal code.
plot_es()
The plot_es()
function creates a plot based on ggplot2
.
plot_es()
has 12 arguments.
| Arguments | Description |
|----|----|
| data | Data frame created by run_es()
|
| type | The type of confidence interval visualization: “ribbon” (default) or “errorbar” |
| vline_val | The x-intercept for the vertical reference line (default: 0) |
| vline_color | Color for the vertical reference line (default: “#000”) |
| hline_val | The y-intercept for the horizontal reference line (default: 0) |
| hline_color | Color for the horizontal reference line (default: “#000”) |
| linewidth | The width of the lines for the plot (default: 1) |
| pointsize | The size of the points for the estimates (default: 2) |
| alpha | The transparency level for ribbons (default: 0.2) |
| barwidth | The width of the error bars (default: 0.2) |
| color | The color for the lines and points (default: “#B25D91FF”) |
| fill | The fill color for ribbons (default: “#B25D91FF”). |
If you don’t care about the details, you can just pass the data frame
created with run_es()
and the plot will be complete.
plot_es(event_study)
plot_es(event_study, type = "errorbar")
plot_es(event_study, type = "errorbar", vline_val = -.5)
Since it is created on a ggplot2
basis, it is possible to modify minor
details.
plot_es(event_study, type = "errorbar") +
ggplot2::scale_x_continuous(breaks = seq(-5, 5, by = 1)) +
ggplot2::ggtitle("Result of Event Study")
staggered = TRUE
with time_transform = TRUE
Date
) in staggered adoption settings.timing
to accept original time values (e.g., specific Date
s)timing = 19
), users will be able to specify a Date
or other
original time value directly. This will simplify workflow when
time_transform = TRUE
.If you find an issue, please report it on the GitHub Issues page.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.