r.cv.rolling: Rolling-window cross-validation for rank selection

View source: R/cv-rolling.R

r.cv.rollingR Documentation

Rolling-window cross-validation for rank selection

Description

Picks the number of factors 'r' for an interactive-fixed-effects model via standard rolling-window cross-validation. For each of 'k' folds, a fraction 'cv.prop' of eligible units (controls plus treated pre-treatment) is sampled; only sampled units carry a mask. For each sampled unit, a random anchor time 't*' is drawn and the fold's training set excludes 't* - cv.buffer, ..., t* - 1' (buffer), 't*, ..., t* + cv.nobs - 1' (the held-out, scored block), and 't* + cv.nobs, ..., end_of_eligible(t)' (rolling-window future drop; for treated units, 'end_of_eligible' is the cell strictly before treatment onset). MSPE is scored at the held-out block only and averaged across folds.

Usage

r.cv.rolling(
  formula,
  data,
  index,
  method = c("ife", "gsynth", "cfe"),
  r.max = 5L,
  cv.nobs = 3L,
  cv.buffer = 1L,
  k = 20L,
  cv.prop = 0.1,
  cv.rule = c("1se", "min", "1pct"),
  min.T0 = 5L,
  force = "unit",
  seed = NULL,
  verbose = TRUE,
  ...
)

Arguments

formula

A model formula, e.g. 'Y ~ D + X1 + X2'.

data

A long-format data frame.

index

Character vector identifying the panel structure. For 'method = "ife"' and 'method = "gsynth"', length 2: 'c("unit", "time")'. For 'method = "cfe"', length >= 2: the first two entries are 'c("unit", "time")' and any additional entries are extra grouping fixed-effect columns forwarded to the inner ‘fect()' call’s 'index =' argument.

method

Estimator. One of '"ife"' (IFE-EM, internally 'time.component.from = "notyettreated"'), '"gsynth"' (GSC, internally 'time.component.from = "nevertreated"'), or '"cfe"' (Complex Fixed Effects, internally 'time.component.from = "notyettreated"'). All three paths populate 'Y.ct.full' at masked positions, so MSPE scoring works uniformly. For CFE, rolling CV picks 'r' only; CFE-specific arguments ('Z', 'gamma', 'Q', 'Q.type', 'kappa', extra index columns) are forwarded via '...' and held fixed at their user-supplied values.

r.max

Largest candidate rank to evaluate. CV is run over '0:r.max'.

cv.nobs

Length of the held-out (scored) block per unit per fold. Default 3.

cv.buffer

Number of observations immediately BEFORE the held-out block to drop from training (the past-side buffer that attenuates AR-leakage). Default 1. Analogous to 'cv.donut' in the existing 'cv.method = "all_units"' / '"treated_units"' strategies, but applied only on the past side: the future side is dropped by construction.

k

Number of folds. Each fold draws a fresh sample of units and a fresh set of per-unit anchors; the per-r MSPE is averaged across folds and the SE used by the '"1se"' rule reflects fold-to-fold variability. Default 20 (matches the default for the existing CV strategies).

cv.prop

Fraction of eligible units sampled per fold. Only sampled units receive a mask in that fold; the rest stay fully observed and contribute training data at every period. Default 0.1 (paired with 'k = 20' for ~2x coverage of every eligible unit across folds). Across 'k' folds, every eligible unit lands in the holdout roughly 'k * cv.prop' times in expectation. Must satisfy '0 < cv.prop <= 1'. On small panels (n_eligible < 30) consider raising further, since per-fold MSPE precision scales with 'cv.prop * n_eligible * cv.nobs'.

cv.rule

Rule for picking 'r' from the MSPE curve: '"1se"' (default), '"min"', or '"1pct"'.

min.T0

Minimum observations required strictly before the anchor. Sets the lower bound on valid anchor positions. Default 5.

force

One of '"none"', '"unit"', '"time"', '"two-way"'. Default '"unit"'.

seed

Optional integer base seed; per-fold seeds derive from 'seed + fold_id' for reproducibility. Default 'NULL' (use the ambient RNG).

verbose

If TRUE (default), print per-fold per-r MSPE.

...

Additional arguments forwarded to 'fect()'. For 'method = "cfe"', the user holds CFE structural arguments ('Z', 'gamma', 'Q', 'Q.type', 'Q.bspline.degree', 'kappa', etc.) fixed at their spec via '...'; rolling CV varies only 'r'.

Details

Per-fold unit sampling is required: masking every eligible unit at the same time leaves no donor data at the masked time points and breaks factor identification. Sampling 'cv.prop' of units per fold keeps unsampled units fully observed at all periods.

This is the standard time-series CV design (cf. 'forecast::tsCV', 'tidymodels::sliding_window', 'caret::createTimeSlices') adapted to panel data: each sampled unit gets its own anchor per fold, drawn uniformly from valid positions.

Value

List with components: - 'r.cv': chosen rank. - 'cv.rule': rule applied. - 'mspe': data.frame of per-r MSPE (averaged across folds), SE across folds, and held-out cell counts. - 'mspe.per.fold': r-by-k matrix of per-fold MSPE. - 'k', 'cv.nobs', 'cv.buffer', 'cv.prop': parameters used. - 'n.units.masked': distinct units that contributed to at least one fold's holdout.

Examples

## Not run: 
  library(fect)
  data(simdata)
  res <- r.cv.rolling(Y ~ D, data = simdata, index = c("id", "time"),
                      method = "ife", r.max = 5,
                      cv.nobs = 3, cv.buffer = 1, k = 20)
  res$r.cv
  ## then use the chosen r in a CV-disabled fit:
  fit <- fect(Y ~ D, data = simdata, index = c("id", "time"),
              method = "ife", time.component.from = "notyettreated",
              CV = FALSE, r = res$r.cv, se = TRUE)

## End(Not run)


fect documentation built on April 30, 2026, 9:06 a.m.