run_es: Run Event Study with Fixed Effects

View source: R/event_study.R

run_esR Documentation

Run Event Study with Fixed Effects

Description

This function performs an event study using fixed effects regression based on a panel dataset. It generates dummy variables for each lead and lag period relative to the treatment timing, applies optional covariates and fixed effects, and estimates the model using 'fixest::feols'.

Usage

run_es(
  data,
  outcome,
  treatment,
  time,
  staggered = FALSE,
  timing,
  lead_range = NULL,
  lag_range = NULL,
  covariates = NULL,
  fe,
  cluster = NULL,
  weights = NULL,
  baseline = -1,
  interval = 1,
  time_transform = FALSE,
  unit = NULL
)

Arguments

data

A data frame containing the panel dataset.

outcome

The outcome variable, specified unquoted. You may use a raw variable name (e.g., 'y') or a transformation (e.g., 'log(y)').

treatment

The treatment indicator (unquoted). Can be binary numeric ('0/1') or logical ('TRUE/FALSE'). Typically equals 1 (or 'TRUE') in and after the treated period, 0 otherwise.

time

The time variable (unquoted). Used to calculate the relative timing.

staggered

Logical. If 'TRUE', allows treatment timing to vary across units. Requires 'timing' to be a column name. Default is 'FALSE'.

timing

The time period when the treatment occurs. If 'staggered = FALSE', must be a single numeric value (e.g., '2005'). If 'staggered = TRUE', must be an unquoted variable name representing the treatment timing for each unit. If 'time_transform = TRUE', specify 'timing' as an integer corresponding to the transformed time index within each unit (e.g., 5 for the fifth time point).

lead_range

Number of pre-treatment periods to include as leads (e.g., 5 = 'lead5', 'lead4', ..., 'lead1'). If 'NULL', the function will automatically determine the maximum possible lead across all units.

lag_range

Number of post-treatment periods to include as lags (e.g., 3 = 'lag0', 'lag1', 'lag2', 'lag3'). If 'NULL', the function will automatically determine the maximum possible lag across all units.

covariates

Optional covariates to include in the regression. Must be supplied as a one-sided formula (e.g., '~ x1 + x2').

fe

Fixed effects to control for unobserved heterogeneity. Must be a one-sided formula (e.g., '~ id + year').

cluster

Clustering specification for robust standard errors. Accepts either: - a character vector of column names (e.g., 'c("id", "year")'), or - a one-sided formula (e.g., '~ id + year' or '~ id^year'). Cluster variables are internally re-evaluated after filtering for the estimation window.

weights

Optional observation weights. Must be supplied as a one-sided formula (e.g., '~ popwt'). If 'NULL', unweighted regression is performed.

baseline

The relative time (e.g., '-1') to use as the reference period. The corresponding dummy variable will be excluded from the regression and added manually to the results with estimate 0. Must lie within the specified 'lead_range' and 'lag_range'. If not, an error will be thrown.

interval

The interval between time periods. Use '1' for annual data (default), '5' for 5-year intervals, etc.

time_transform

Logical. If TRUE, the time variable will be converted to a unit-level sequence (1, 2, 3, ...) based on its order within each unit. Useful for panel data with non-continuous time variables. Default is FALSE.

unit

The unit (individual) identifier for panel data. Required when 'time_transform = TRUE'. Must be an unquoted variable name (e.g., 'id').

Details

This function is intended for difference-in-differences or event study designs with panel data. It automatically: - Computes relative time: (time - timing) / interval - Generates dummy variables for specified leads and lags - Removes the baseline term from estimation and appends it back post-estimation - Uses fixest::feols() for fast and flexible estimation

Both fixed effects and clustering are fully supported. Observation weights can be specified using the 'weights' argument.

If 'time_transform = TRUE', the time variable is internally replaced with a unit-level sequence (e.g., 1, 2, 3, ...) based on its order within each unit (as specified by the 'unit' argument). This is useful when the time variable is irregular (e.g., Date-type data or monthly data with gaps). Note that in this case, the 'timing' argument must be specified based on the transformed index (e.g., 5 corresponds to the fifth time point in the sorted order within each unit).

Value

A tibble with the event study regression results, including: - 'term': Name of the lead or lag dummy variable - 'estimate': Coefficient estimate - 'std.error': Standard error - 'statistic': t-statistic - 'p.value': p-value - 'conf_high': Upper bound of 95 - 'conf_low': Lower bound of 95 - 'relative_time': Time scaled relative to the treatment - 'is_baseline': Logical indicator for the baseline term (equals 'TRUE' only for the excluded dummy)

Examples

## Not run: 
# Assume df is a panel dataset with variables: id, year, y, treat, x1, x2, var1, var2, popwt

# Minimal example without covariates
run_es(
  data       = df,
  outcome    = y,
  treatment  = treat,
  time       = year,
  timing     = 2005,
  lead_range = 2,
  lag_range  = 2,
  fe         = ~ id + year,
  cluster    = ~ id,
  baseline   = -1
)

# Weighted regression
run_es(
  data       = df,
  outcome    = y,
  treatment  = treat,
  time       = year,
  timing     = 2005,
  lead_range = 2,
  lag_range  = 2,
  covariates = ~ x1 + x2,
  fe         = ~ id + year,
  cluster    = ~ id,
  weights    = ~ wt,
  baseline   = -1
)

# Example with staggered treatment timing
# Suppose `treat_time` indicates the year each unit was treated
run_es(
  data       = df,
  outcome    = y,
  treatment  = is_treated,
  time       = year,
  staggered  = TRUE,
  timing     = treat_time,  # a variable with treatment years per unit
  lead_range = 3,
  lag_range  = 4,
  fe         = ~ id + year,
  cluster    = ~ id,
  baseline   = -1
)

## End(Not run)

fixes documentation built on June 8, 2025, 12:10 p.m.