run_es | R Documentation |
This function performs an event study using fixed effects regression. It first generates lead and lag dummy variables relative to the treatment timing, scales the time intervals if specified, and then estimates the regression model.
run_es(
data,
outcome,
treatment,
time,
timing,
lead_range,
lag_range,
fe,
cluster = NULL,
baseline = -1,
interval = 1
)
data |
A dataframe containing the dataset. |
outcome |
The name of the outcome variable (e.g., "y"). Should be unquoted. |
treatment |
The name of the treatment variable (e.g., "treated"). Should be unquoted. |
time |
The name of the time variable (e.g., "year"). Should be unquoted. |
timing |
The time period when the treatment occurred. For example, if the treatment was implemented in 1995, set 'timing = 1995'. |
lead_range |
Number of time periods to include before the treatment (negative leads). For example, 'lead_range = 3' includes 3 periods before the treatment. |
lag_range |
Number of time periods to include after the treatment (positive lags). For example, 'lag_range = 2' includes 2 periods after the treatment. |
fe |
A vector of fixed effects variables or an additive expression (e.g., firm_id + year). These variables account for unobserved heterogeneity. |
cluster |
An optional variable for clustering standard errors. For example, 'cluster = "state"'. |
baseline |
The relative time period to use as the baseline (default: -1). The corresponding dummy variable is excluded from the regression and treated as the reference group. For example, if 'baseline = 0', the treatment year is the baseline. |
interval |
The time interval between observations (default: 1). For example, use 'interval = 5' for datasets where time steps are in 5-year intervals. |
This function is designed for panel data and supports time intervals other than 1 (e.g., 5-year intervals). It automatically scales the relative time variable using the 'interval' parameter.
Steps: 1. Compute the relative time for each observation as '(time - timing) / interval'. 2. Generate lead and lag dummy variables within the specified ranges ('lead_range', 'lag_range'). 3. Construct and estimate the fixed effects regression model using 'fixest::feols'. 4. Format the regression results into a tidy dataframe.
If 'interval > 1', ensure that the specified 'lead_range' and 'lag_range' correspond to the number of time intervals, not the absolute number of years.
A tidy dataframe with regression results. This includes: - 'term': The lead or lag variable names. - 'estimate': Estimated coefficients. - 'std.error': Standard errors. - 'conf.high': Upper bound of the 95 - 'conf.low': Lower bound of the 95 - 'relative_time': Scaled relative time based on the specified 'interval'.
# Simulate panel data
df <- tibble::tibble(
firm_id = rep(1:50, each = 10), # 50 firms over 10 years
state_id = rep(sample(1:10, size = 50, replace = TRUE), each = 10),
year = rep(2000:2009, times = 50),
is_treated = rep(sample(c(1, 0), size = 50, replace = TRUE, prob = c(0.5, 0.5)), each = 10),
y = rnorm(500, mean = 0, sd = 1) # Simulated outcome variable
)
# Run event study
event_study <- run_es(
data = df,
outcome = y,
treatment = is_treated,
time = year,
timing = 2005,
lead_range = 5, # Corresponds to years 2000-2004 (relative time: -5 to -1)
lag_range = 4, # Corresponds to years 2006-2009 (relative time: 1 to 4)
fe = firm_id + year,
cluster = "state_id",
baseline = -1,
interval = 1
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.