fect: Fixed Effects Counterfactual Estimators
In fect: Fixed Effects Counterfactual Estimators

View source: R/default.R

fect	R Documentation

Fixed Effects Counterfactual Estimators

Description

Implements counterfactual estimators in TSCS data analysis and statistical tools to test their identification assumptions.

Usage

fect(formula = NULL, data, Y, D, X = NULL,
            W = NULL, W.est = NULL, W.agg = NULL,
            group = NULL,
            na.rm = FALSE,
            index, force = "two-way",
            time.component.from = "notyettreated", em = TRUE,
            r = 0, lambda = NULL, nlambda = 10,
            CV = NULL, k = 20, cv.prop = 0.1, cv.method = "rolling",
            cv.nobs = 3, cv.donut = 1, cv.buffer = 1, criterion = "mspe",
            binary = FALSE, QR = FALSE,
            method = "fe",  se = FALSE, vartype = "bootstrap",
            para.error = "auto", cl = NULL,
            ci.method = "normal", quantile.CI = NULL,
            nboots = 200, alpha = 0.05,
            parallel = TRUE, cores = NULL, tol = 1e-5,
            max.iteration = 5000, seed = NULL,
            min.T0 = NULL, max.missing = NULL,
            proportion = 0.3, pre.periods = NULL,
            f.threshold = 0.5, tost.threshold = NULL,
            knots = NULL, degree = 2,
            group.fe = NULL,
            cfe = NULL,
            Z = NULL, gamma = NULL, Q = NULL, kappa = NULL,
            Q.type = NULL,
            Q.bspline.degree = NULL,
            Z.param = NULL, Q.param = NULL,
            balance.period = NULL, fill.missing = FALSE,
            placeboTest = FALSE, placebo.period = NULL,
            carryoverTest = FALSE, carryover.period = NULL, carryover.rm = NULL,
            loo = FALSE, permute = FALSE, m = 2,
            normalize = FALSE, keep.sims = FALSE,
            cm = FALSE,
            loading.bound = "none", gamma.loading = NULL,
            gamma.loading.grid = NULL,
            cv.rule = "1se")

Arguments

`formula`	an object of class "formula": a symbolic description of the model to be fitted, e.g, Y~D+X1+X2
`data`	a data frame, can be a balanced or unbalanced panel data.
`Y`	the outcome indicator.
`D`	the treatment indicator. The treatment should be binary (0 and 1).
`X`	time-varying covariates. Covariates that have perfect collinearity with specified fixed effects are dropped automatically.
`W`	a string giving the column name of a weight variable. Convenience default that populates both `W.est` and `W.agg` when those are left `NULL`. Suitable for survey or sample weights, where the same column applies to both the outcome-model fit and the across-treated-obs aggregation.
`W.est`	a string giving the column name of a weight variable that enters the outcome-model fit (the weighted least squares applied inside the IFE / MC / CFE solver). When `NULL`, falls back to `W`. Use this (with `W.agg = NULL`) when the weight reflects fit-side considerations and the estimand is the unweighted average ATT across treated cells.
`W.agg`	a string giving the column name of a weight variable that enters the across-treated-obs aggregation (`att.on`, `est.avg`, `est.att`). When `NULL`, falls back to `W`. Use this (with `W.est = NULL`) when the user's estimand differs from "ATT for the treated units in the analysis sample" — common cases include calibration weights to a target population or post-stratification weights that should adjust the summary but not the model fit. A clean in-package solution for inverse-probability weights for confounding adjustment is under development for fect 3.0 (a cross-fit doubly-robust path); `W.agg` is not a substitute for that work and does not deliver the doubly-robust properties an IPW user expects. In v2.3.1, `W.est` and `W.agg` (when both supplied) must point to the same column; truly distinct columns for fit vs. aggregation (e.g. combined survey x IPW designs) are scheduled for v2.4.0.
`group`	the group indicator. If specified, the group-wise ATT will be estimated.
`na.rm`	a logical flag indicating whether to list-wise delete missing observations. Default to FALSE. If `na.rm = FALSE`, it allows the situation when Y is missing but D is not missing for some observations. If `na.rm = TRUE`, it will list-wise delete observations whose Y, D, or X is missing.
`index`	a character vector specifying the unit (first element) and time (second element) indicators. For most methods, must be of length 2. For `method = "cfe"`, additional elements (third, fourth, etc.) specify extra fixed-effect grouping variables. Every observation should be uniquely defined by the pair of the unit and time indicator.
`force`	a string indicating whether unit or time or both fixed effects will be imposed. Must be one of the following, "none", "unit", "time", or "two-way". The default is "two-way".
`time.component.from`	Controls which units provide the time-varying model components (time fixed effects, factor structure, temporal dynamics). Options are `"notyettreated"` (default) — all units contribute during their pre-treatment periods, or `"nevertreated"` — only never-treated units estimate the time components, which are then projected onto treated units.
`em`	a logical flag indicating whether to use the EM algorithm for missing data in the estimation sample. Default is `TRUE`. Setting `em = FALSE` requires a complete estimation sample and is only compatible with `time.component.from = "nevertreated"`.
`r`	an integer specifying the number of factors. If `CV = TRUE`, the cross validation procedure will select the optimal number of factors from `r` to 5.
`lambda`	a single or sequence of positive numbers specifying the hyper-parameter sequence for matrix completion method. If `lambda` is a sequence and `CV = 1`, cross-validation will be performed.
`nlambda`	an integer specifying the length of hyper-parameter sequence for matrix completion method. Default is `nlambda = 10`.
`CV`	a logical flag indicating whether cross-validation will be performed to select the optimal number of factors or hyper-parameter in matrix completion algorithm. If `r` is not specified, the procedure will search through `r = 0` to `5`.
`k`	an integer specifying number of cross-validation rounds. Default is `k = 20`.
`cv.prop`	a numerical value specifying the proportion of testing set compared to sample size during the cross-validation procedure.
`cv.method`	a string specifying the cross-validation masking strategy. One of `"rolling"` (default; standard time-series rolling-window CV), `"block"` (random scattered anchors with contiguous-block masking), or `"loo"` (leave-one-out, available for `fect_nevertreated`). The legacy aliases `"all_units"` (= `"block"`) and `"treated_units"` (block masking restricted to treated pre-treatment cells) are still accepted but emit a deprecation message; both will be replaced by the unified `(cv.method, cv.units)` API in v2.4.0.
`cv.nobs`	an integer specifying the length of continuous observations within a unit in the testing set. Default is `cv.nobs = 3`.
`cv.donut`	an integer specifying the length of removed observations at the head and tail of the continuous observations specified by `cv.nobs`. Used by block CV (`cv.method = "block"`, or the legacy aliases `"all_units"`/`"treated_units"`). Default is 1 (matches `cv.buffer` for rolling CV).
`cv.buffer`	an integer specifying the length of past-side buffer cells masked from training (but not scored) immediately before each rolling-window holdout. Used only by `cv.method = "rolling"`; the future side is dropped from training by construction. Analogous to `cv.donut` for block CV but applied only on the past side. Default is 1.
`criterion`	criterion used for model selection. Default is "mspe". `"mspe"` for the mean squared prediction error, `"gmspe"` for the geometric-mean squared prediction errors, `"moment"` for period-weighted residuals in test sets, `"pc"` for an information criterion method.
`binary`	This version doesn't support this option.
`QR`	This version doesn't support this option.
`method`	a string specifying which imputation algorithm will be used. `"fe"`, `"ife"`, `"mc"`, `"gsynth"`, or `"cfe"`. Default is `"fe"`.
`se`	a logical flag indicating whether uncertainty estimates will be produced.
`vartype`	a string specifying the type of variance estimator, e.g. `"bootstrap"`. Three values are supported: `"bootstrap"` (nonparametric cluster-bootstrap; the safe default), `"jackknife"` (leave-one-unit-out), and `"parametric"` (two-stage pseudo-treated parametric bootstrap). The `"parametric"` option is restricted to the gsynth-style regime: it requires `time.component.from = "nevertreated"`, no treatment reversal, and `method` not in `c("mc", "both")`. These three conditions correspond to Gates A, B, and C in the three-gate defense system (see `ARCHITECTURE.md`). For all other settings, `vartype = "bootstrap"` is recommended.
`para.error`	a string specifying the residual-error model used by the parametric bootstrap path; sub-option of `vartype = "parametric"` (silently ignored otherwise). One of `"auto"`, `"ar"`, `"empirical"`, or `"wild"`. Default `"auto"` resolves at fit time to `"empirical"` on a fully-observed panel and `"ar"` on a panel with missing cells; the resolved label is stored on `fit$para.error`. `"ar"` estimates an AR(1) error process from control residuals (works on any panel shape); `"empirical"` resamples residuals i.i.d. from the main-fit pool (requires fully- observed panel); `"wild"` applies unit-level Rademacher sign-flips over the empirical residual pool (requires fully-observed panel).
`cl`	a string specifying the cluster column for cluster bootstrapping. When `group.fe` is set to a single column and `cl` is unset, `cl` auto-defaults to `group.fe[1]` — the natural choice when treatment varies at the group level (Bertrand, Duflo & Mullainathan 2004). To override — for example, to cluster at the unit level when group-level FE is absorbed — pass `cl = index[1]` (e.g., `cl = "id"`). Note that `cl = NULL` does NOT disable clustering: the case bootstrap always resamples units (which is unit-level clustering); the auto-default only changes the resample unit to a coarser level. `cl = FALSE` is rejected with a guiding error.
`ci.method`	a string controlling how the bootstrap distribution becomes the CIs reported in fect's `est.*` slots. Two values: `"normal"` (default; Wald CI: `\hat\theta \pm z \cdot SE`) or `"basic"` (reflected pivot: `[2\hat\theta - q_{1-\alpha/2},\, 2\hat\theta - q_{\alpha/2}]`, the literature-standard "percentile" CI per Davison & Hinkley (1997, §5.2.1) and what `boot::boot.ci(type = "basic")` returns). For `"percentile"`, `"bc"`, or `"bca"` CIs (typically wanted on alternative estimands like `aptt` and `log.att`), call `estimand` after fitting.
`quantile.CI`	deprecated as of v2.4.2. Use `ci.method` instead. `quantile.CI = FALSE` maps to `ci.method = "normal"`; `quantile.CI = TRUE` maps to `ci.method = "basic"`. Both mappings still work but emit a one-time deprecation warning.
`nboots`	an integer specifying the number of bootstrap runs. Ignored if `se=FALSE`. Default `200`, sufficient for the standard error and the normal-CI structure that `fect` ships in `est.att`/`att.avg`. For tail-quantile-based CI methods accessed via `estimand()` (`ci.method` `"basic"`, `"percentile"`, `"bc"`, `"bca"`), bump to `nboots = 1000` or higher (Efron 1987 Section 3; DiCiccio & Efron 1996 Section 4); `estimand()` emits a warning when called on under-replicated fits.
`alpha`	the significance level for hypothesis tests and confidence intervals. Default `0.05`.
`parallel`	controls which operations run in parallel. Accepted values: `TRUE` Enable parallel computing for both CV and bootstrap (default in `fect()`). `FALSE` Disable all parallel computing. `"cv"` Enable parallel CV only; bootstrap runs serially. `"boot"` Enable parallel bootstrap only; CV runs serially. `c("cv","boot")` Explicit form of `TRUE`: parallel for both. When `parallel = TRUE`, auto-enable thresholds apply: CV parallelism engages only when `Nco * TT` exceeds the per-method threshold (`ife = 20000`, `mc = 20000`, `cfe = 60000`). Explicit `"cv"` overrides the threshold. Nested parallelism (calling `fect()` from within a `future_lapply` or `foreach %dopar%` block) should use `parallel = FALSE` to avoid deadlock. When using `parallel` with `method = "mc"`, parallel CV computes all candidate lambda values without early stopping (the serial path uses a `break_check` short-circuit to skip lambdas with diminishing MSPE returns). This guarantees numerical identity between serial and parallel results but may compute a few extra lambda values compared to the serial path. Use `parallel = FALSE` to preserve the short-circuit behavior.
`cores`	an integer indicating the number of cores for parallel computing.
`tol`	a positive number indicating the relative tolerance for the EM update check (`\|\|fit_new - fit_old\|\|_F / \|\|fit_old\|\|_F < tol`). Default tightened from `1e-3` to `1e-5` in v2.4.3 because the older default produced under-converged IFE/CFE point estimates that shifted up to 40% relative to the converged value (inference was preserved, but the reported numbers were stopping-point-dependent).
`max.iteration`	the maximal number of EM iterations. Default raised from `1000` to `5000` in v2.4.3 to accommodate the tighter tol; canonical IFE/CFE fits converge in 700-2000 iters at `tol = 1e-5`. A `warning()` is emitted when the EM hits this cap without satisfying the tol gate (under-convergence diagnostic).
`seed`	an integer seed for random number generation.
`min.T0`	an integer specifying the minimum number of pre-treatment periods for each treated unit.
`max.missing`	an integer specifying the maximum number of missing observations allowed per unit.
`proportion`	a numeric value specifying which pre-treatment periods are used for goodness-of-fit tests.
`pre.periods`	a vector specifying the range of pre-treatment periods used for the goodness-of-fit test.
`f.threshold`	a numeric threshold for an F-test in equivalence testing. Default `0.5`.
`tost.threshold`	a numeric threshold for two-one-sided t-tests.
`knots`	a numeric vector specifying knots (currently unused; reserved for future use).
`degree`	an integer specifying the degree (currently unused; reserved for future use).
`group.fe`	a character vector of column names naming additional simple additive fixed-effect groupings to absorb (e.g., `group.fe = "state"` when rows are counties and treatment varies at state level). Each entry must be a column in `data`. Each entry must be nested in `index[1]` — i.e., constant within each level of the unit identifier — otherwise an error is raised. When `group.fe` is set, `method = "fe"` is silently routed to `method = "cfe"` (FE is a subset of CFE, identical result); `method = "ife"`, `"mc"`, `"both"`, and `"gsynth"` hard-error (use `method = "cfe"` with `r > 0` for free latent factors with group-level FE). When `group.fe` has length 1 and `cl` is unset, `cl` auto-defaults to `group.fe[1]`. To cluster at the unit level instead, pass `cl = index[1]` (e.g., `cl = "id"`); `cl = NULL` does NOT change behavior (the case bootstrap always resamples units regardless of `cl`), and `cl = FALSE` is rejected with a guiding error.
`cfe`	a vector of lists specifying interactive fixed effects for `method="cfe"`.
`Z`	a vector specifying the time-invariant covariates for the Z matrix.
`gamma`	a vector specifying the time-varying covariates for the gamma matrix.
`Q`	a vector specifying the time-varying covariates for the Q matrix.
`kappa`	a vector specifying the time-invariant covariates for the kappa matrix.
`Q.type`	a vector specifying the type of Q matrix.
`Q.bspline.degree`	an integer specifying the degree used when `Q.type` includes `"bspline"` in `method="cfe"`. If NULL, a default degree is chosen based on the number of distinct time values.
`Z.param`	a list specifying the parameters for the Z matrix.
`Q.param`	a list specifying the parameters for the Q matrix.
`balance.period`	a length-2 vector specifying a time range for a balanced sample.
`fill.missing`	a logical flag indicating whether to allow missing observations in a balanced sample.
`placeboTest`	a logical flag indicating whether to perform a placebo test.
`placebo.period`	an integer or 2-element numeric vector specifying pseudo-treatment periods.
`carryoverTest`	a logical flag for carryover tests.
`carryover.period`	an integer or 2-element numeric vector specifying pseudo-carryover periods.
`carryover.rm`	an integer specifying the range of post-treatment periods to treat as carryover.
`loo`	a logical flag for leave-one-period-out goodness-of-fit tests.
`permute`	a logical flag indicating whether to run a permutation test.
`m`	an integer specifying the block length for permutation tests. Default `2`.
`normalize`	a logical flag indicating whether to scale outcome and covariates.
`keep.sims`	a logical flag indicating whether to save unit-time level bootstrap effects. Default `keep.sims = FALSE`. If `se = FALSE`, this argument is ignored.
`cm`	a logical flag indicating whether to enable causal moderation analysis. When `TRUE`, the estimator decomposes the treatment effect into effect modification and causal moderation components. Currently available for `method = "fe"` and `method = "ife"`. Default is `FALSE`.
`loading.bound`	a string controlling whether treated-unit factor loadings are bounded inside the convex hull of control loadings. `"none"` (default) reproduces standard GSC behavior. `"simplex"` constrains each treated unit's loading to be a non-negative convex combination of control loadings via an entropy-regularized simplex projection, ensuring the imputed counterfactual lies pointwise in the convex hull of factor-implied control outcomes. Currently applies only to `method = "ife"` or `method = "gsynth"` (equivalent forms), and requires `time.component.from = "nevertreated"`.
`gamma.loading`	scalar regularization strength for the `"simplex"` projection. `NULL` (default) triggers 5-fold cross-validation over `gamma.loading.grid`. A numeric value is used directly. Ignored when `loading.bound = "none"`.
`gamma.loading.grid`	a numeric vector of candidate `gamma.loading` values for cross-validation. `NULL` (default) uses `10^seq(-2, 2, length.out = 9)`. Ignored when `loading.bound = "none"` or when `gamma.loading` is supplied.
`cv.rule`	a string selecting the cross-validation rule for choosing the number of factors `r` (or matrix-completion penalty `lambda`). One of: `"1se"` (default) The 1-SE rule (Breiman, Friedman, Olshen and Stone 1984; Hastie, Tibshirani and Friedman 2009, Section 7.10): pick the smallest `r` whose mean CV criterion is within one fold-SE of the minimum-CV-error `r`. Biases toward parsimony in a fold-aware way — when CV is precise, it allows larger `r`; when CV is noisy, it gravitates to simpler models. `"min"` Pick the `r` that minimizes the mean CV criterion (no tolerance). `"1pct"` Legacy pre-2.3.0 heuristic: pick the smallest `r` within 1% relative tolerance of the best mean CV criterion. Use this for byte-identical reproducibility of pre-2.3.0 fits. Ignored when `CV = FALSE`.

Details

fect implements counterfactual estimators for TSCS data. It first imputes counterfactuals by fitting an outcome model using untreated observations, then estimates the individual treatment effect as the difference between observed and predicted outcomes. Finally, it computes average treatment effects on the treated (ATT) and period-specific ATTs. Placebo and equivalence tests help evaluate identification assumptions.

Value

`Y.dat`	T-by-N matrix of the outcome variable.
`D.dat`	T-by-N matrix of the treatment variable.
`I.dat`	T-by-N matrix of observation indicators (observed/missing).
`Y`	name of the outcome variable.
`D`	name of the treatment variable.
`X`	name of any time-varying covariates.
`W`	name of the weight variable.
`index`	name of the unit and time indicators.
`force`	specified fixed effects option.
`T`	number of time periods.
`N`	number of units.
`p`	number of time-varying observables.
`r.cv`	number of factors (selected by cross-validation if needed).
`lambda.cv`	optimal hyper-parameter for matrix completion, if applicable.
`beta`	coefficients for any covariates in an interactive fixed effects model.
`sigma2`	mean squared error.
`IC`	information criterion.
`est`	results of the fitted model.
`MSPE`	mean squared prediction error from cross-validation.
`CV.out`	results of the cross-validation procedure.
`niter`	number of iterations.
`factor`	estimated time-varying factors.
`lambda`	estimated loadings.
`lambda.tr`	estimated loadings for treated units.
`lambda.co`	estimated loadings for control units.
`mu`	estimated grand mean.
`xi`	estimated time fixed effects.
`alpha`	estimated unit fixed effects.
`alpha.tr`	estimated unit fixed effects for treated units.
`alpha.co`	estimated unit fixed effects for control units.
`validX`	logical indicating if valid covariates exist.
`validF`	logical indicating if factors exist.
`id`	vector of unit IDs.
`rawtime`	vector of time periods.
`obs.missing`	matrix indicating missingness patterns.
`Y.ct`	T-by-N matrix of predicted outcomes under no treatment.
`eff`	T-by-N matrix of estimated individual treatment effects.
`res`	residuals for observed values.
`eff.pre`	effects for treated units in pre-treatment periods.
`eff.pre.equiv`	pre-treatment effects under baseline (two-way FE) model.
`pre.sd`	by-period residual standard deviations for pre-treatment ATT.
`att.avg`	overall average treatment effect on the treated.
`att.avg.W`	weighted ATT.
`att.avg.unit`	by-unit average treatment effect on the treated.
`time`	time index for switch-on treatment effect.
`count`	count of observations for each switch-on effect time.
`att`	switch-on treatment effect.
`att.on.W`	weighted switch-on effect.
`time.off`	time index for switch-off treatment effect.
`att.off`	switch-off treatment effect.
`att.off.W`	weighted switch-off effect.
`count.off`	count for each switch-off period.
`att.placebo`	ATT for placebo periods.
`att.carryover`	ATT for carryover periods.
`eff.calendar`	ATT by calendar time.
`eff.calendar.fit`	loess-fitted ATT by calendar time.
`N.calandar`	number of treated observations each calendar period.
`balance.avg.att`	ATT for balanced sample.
`balance.att`	switch-on ATT for balanced sample.
`balance.time`	time index for balanced sample.
`balance.count`	count for each time in balanced sample.
`balance.att.placebo`	ATT for placebo period in balanced sample.
`group.att`	ATT for different groups.
`group.output`	list of switch-on treatment effects by group.
`est.att.avg`	inference for `att.avg`.
`est.att.avg.unit`	inference for `att.avg.unit`.
`est.att`	inference for `att`.
`est.att.W`	inference for weighted `att`.
`est.att.off`	inference for switch-off.
`est.att.off.W`	inference for weighted switch-off.
`est.placebo`	inference for placebo ATT.
`est.carryover`	inference for carryover ATT.
`est.eff.calendar`	inference for `eff.calendar`.
`est.eff.calendar.fit`	inference for `eff.calendar.fit`.
`est.balance.att`	inference for balanced sample switch-on.
`est.balance.avg`	inference for balanced sample average ATT.
`est.balance.placebo`	inference for balanced sample placebo.
`est.avg.W`	inference for `att.avg.W`.
`est.beta`	inference for `beta`.
`est.group.att`	inference for group-specific ATT.
`est.group.output`	inference for group output.
`att.avg.boot`	bootstrap draws for `att.avg`.
`att.avg.unit.boot`	bootstrap draws for `att.avg.unit`.
`att.count.boot`	bootstrap draws for `count`.
`att.off.boot`	bootstrap draws for `att.off`.
`att.off.count.boot`	bootstrap draws for `count.off`.
`att.placebo.boot`	bootstrap draws for `att.placebo`.
`att.carryover.boot`	bootstrap draws for `att.carryover`.
`balance.att.boot`	bootstrap draws for `balance.att`.
`att.bound`	equivalence confidence interval for pre-trend.
`att.off.bound`	equivalence confidence interval for switch-off.
`beta.boot`	bootstrap draws for `beta`.
`test.out`	F-test and equivalence test results for pre-treatment fit.
`loo.test.out`	leave-one-period-out test results.
`permute`	permutation test results.

Author(s)

Licheng Liu; Ye Wang; Yiqing Xu; Ziyi Liu

References

Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536), 1716-1730.

Bai, J. (2009). Panel data models with interactive fixed effects. Econometrica, 77(4), 1229-1279.

Liu, L., Wang, Y., and Xu, Y. (2022). A Practical Guide to Counterfactual Estimators for Causal Inference with Time-Series Cross-Sectional Data. American Journal of Political Science, 68(1), 160-176.

Xu, Y. (2017). Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models. Political Analysis, 25(1), 57-76.

Examples

library(fect)
data(simdata)
out <- fect(Y ~ D + X1 + X2, data = simdata,
            index = c("id","time"), force = "two-way",
            CV = TRUE, r = c(0, 5), se = 0, parallel = FALSE)

fect documentation built on May 31, 2026, 1:06 a.m.