simdata: Simulated panel data with two latent factors

simdataR Documentation

Simulated panel data with two latent factors

Description

A simulated panel dataset with continuous outcomes used throughout the package vignettes to demonstrate factor-augmented counterfactual estimators. The data-generating process follows Liu, Wang, and Xu (2024) with one modification (see Format).

The panel has N = 200 units and T = 35 time periods. Treatment switches on and off over time (99 of 150 treated units experience at least one reversal), reflecting a general treatment pattern rather than simple staggered adoption. The outcome includes two latent factors (r = 2), so the parallel-trends assumption is violated and the standard fixed-effects estimator is biased. Treatment assignment loads on the same factors and fixed effects that enter the outcome—units with larger \lambda_i and \alpha_i are more likely to be treated—so the confounding is structural and cannot be removed by two-way fixed effects alone.

Format

A data frame with the following columns:

id

unit identifier (1–200)

time

time period (1–35)

Y

observed outcome

error

idiosyncratic error \varepsilon_{it} \sim N(0, 2)

eff

realized treatment effect \tau_{it}

tr_cum, tr_prob

treatment-probability constructions

D

treatment indicator

X1, X2

observed time-varying covariates \sim N(0, 1) with coefficients 1 and 3

alpha

unit fixed effect \alpha_i \sim N(0, 1)

xi

time fixed effect \xi_t (AR(1) with drift)

F1, F2

latent time factors f_t \in \mathbb{R}^2 (one trending, one white noise)

L1, L2

unit-specific factor loadings \lambda_i \sim N(0.5, 1)

FL1, FL2

per-cell factor-loading products \lambda_{i,k} \cdot f_{t,k} (k = 1, 2)

The DGP is

Y_{it} = \tau_{it} D_{it} + X_{1,it} + 3 X_{2,it} + \mu + 3\alpha_i + \xi_t + 2\, \lambda_i' f_t + \varepsilon_{it},

with grand mean \mu = 5 and treatment effect \tau_{it} \sim N(0.4 \cdot \mathrm{tr\_cum}_{it}/T,\; 0.2).

The 2\, \lambda_i' f_t term doubles the latent factor contribution relative to the original Liu, Wang, and Xu (2024) DGP. The doubling strengthens the factor signal-to-noise ratio (variance of the factor contribution to variance of the residual) from approximately 2.7 to 10.9, which makes the factor structure clearly recoverable by cross-validated rank-selection procedures on this dataset. The unmodified DGP is preserved in earlier package versions; see git log data/simdata.rda for the prior file.

References

Liu, L., Wang, Y., and Xu, Y. (2024). A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data. American Journal of Political Science, 68(1), 160–176.


fect documentation built on April 30, 2026, 9:06 a.m.