Functional Form Sensitivity in Nonlinear DiD

knitr::opts_chunk$set(
  collapse = TRUE, comment = "#>",
  fig.width = 7, fig.height = 4, warning = FALSE, message = FALSE
)
library(NonlinearDiD)
library(ggplot2)
set.seed(101)

Overview

This vignette explores the functional form sensitivity problem in difference-in-differences with binary outcomes, following Roth & Sant'Anna (2023).

The Core Issue

Suppose two groups have baseline outcome probabilities:

Both groups experience the same additive increase in probability: +0.10.

On the probability scale: The change is 0.10 for both — parallel trends holds.

On the logit scale: - Control: logit(0.40) - logit(0.30) = r round(qlogis(0.40) - qlogis(0.30), 3) - Treated: logit(0.35) - logit(0.25) = r round(qlogis(0.35) - qlogis(0.25), 3)

These are not equal! A researcher testing parallel trends on the log-odds scale would (correctly) reject — even though the underlying time trend is identical for both groups on the probability scale.

# Demonstrate scale sensitivity
p_ctrl_pre  <- 0.30; p_ctrl_post <- 0.40
p_treat_pre <- 0.25; p_treat_post <- 0.35

cat("=== Probability Scale ===\n")
cat("Control change:  ", round(p_ctrl_post  - p_ctrl_pre, 4),  "\n")
cat("Treated change:  ", round(p_treat_post - p_treat_pre, 4), "\n")
cat("DiD (prob):      ", round((p_treat_post - p_treat_pre) - (p_ctrl_post - p_ctrl_pre), 4), "\n\n")

cat("=== Log-Odds (Logit) Scale ===\n")
cat("Control change:  ", round(qlogis(p_ctrl_post)  - qlogis(p_ctrl_pre), 4),  "\n")
cat("Treated change:  ", round(qlogis(p_treat_post) - qlogis(p_treat_pre), 4), "\n")
cat("DiD (logit):     ", round((qlogis(p_treat_post) - qlogis(p_treat_pre)) -
                               (qlogis(p_ctrl_post) - qlogis(p_ctrl_pre)), 4), "\n\n")

cat("=== Probit Scale ===\n")
cat("Control change:  ", round(qnorm(p_ctrl_post)  - qnorm(p_ctrl_pre), 4),  "\n")
cat("Treated change:  ", round(qnorm(p_treat_post) - qnorm(p_treat_pre), 4), "\n")
cat("DiD (probit):    ", round((qnorm(p_treat_post) - qnorm(p_treat_pre)) -
                               (qnorm(p_ctrl_post) - qnorm(p_ctrl_pre)), 4), "\n")

Key takeaway: The same underlying DGP yields different DiD estimates and different pre-trends test results depending on the scale chosen. There is no uniquely correct scale — the right scale is determined by the DGP.


When Does Scale Choice Matter?

Scale sensitivity is most severe when:

  1. Baseline probabilities are far from 0.5 (where logit and probit curves are most nonlinear)
  2. Groups have different baseline rates (asymmetric baseline)
  3. Treatment effects are large (more curvature in the relevant region)
# Show severity across baseline probability values
baseline_probs <- seq(0.05, 0.45, by = 0.05)
delta_p        <- 0.10  # same additive change for both groups

severity_df <- do.call(rbind, lapply(baseline_probs, function(p0) {
  p1 <- p0 + delta_p
  # Parallel in prob => same change
  # Logit DiD if treated has different baseline (p0 - 0.05)
  p0_treat <- max(p0 - 0.05, 0.02)
  p1_treat <- p0_treat + delta_p

  logit_did <- (qlogis(p1_treat) - qlogis(p0_treat)) -
               (qlogis(p1)       - qlogis(p0))

  data.frame(
    baseline_ctrl  = p0,
    baseline_treat = p0_treat,
    logit_did      = logit_did
  )
}))

cat("Logit-scale DiD when true probability DiD = 0:\n")
print(severity_df, digits = 3, row.names = FALSE)
cat("\nLarger deviations at low baseline probabilities.\n")

Simulation: Linear vs. Logit DiD

We now simulate data where parallel trends holds on the logit scale (the correct scale for the DGP), and compare estimates from: 1. Linear DiD (standard CS2021) 2. Logit DiD (NonlinearDiD)

# DGP: parallel trends on logit scale
dat <- sim_binary_panel(
  n            = 1000,
  nperiods     = 8,
  prop_treated = 0.5,
  n_cohorts    = 3,
  true_att     = c(0.20, 0.35, 0.25),
  base_prob    = 0.20,   # low baseline: nonlinearity matters most
  unit_fe_sd   = 0.5,
  seed         = 42
)

cat("Baseline outcome rate (untreated, pre-period):",
    round(mean(dat$y[dat$D == 0 & dat$period == 1]), 3), "\n")
cat("True ATTs (avg):", round(mean(c(0.20, 0.35, 0.25)), 3), "\n\n")
# Logit DiD
res_logit <- nonlinear_attgt(
  dat, "y", "period", "id", "g",
  outcome_model = "logit",
  control_group = "nevertreated"
)

# Linear DiD
res_linear <- nonlinear_attgt(
  dat, "y", "period", "id", "g",
  outcome_model = "linear",
  control_group = "nevertreated"
)

# Aggregate
agg_logit  <- nonlinear_aggte(res_logit,  type = "dynamic")
agg_linear <- nonlinear_aggte(res_linear, type = "dynamic")

cat("=== Overall ATT ===\n")
cat("Linear DiD: ", round(agg_linear$overall_att, 4), "\n")
cat("Logit DiD:  ", round(agg_logit$overall_att,  4), "\n")
cat("True ATT:   ", round(mean(c(0.20, 0.35, 0.25)), 4), "\n")

Pre-Trends Sensitivity

A critical practical insight: even if the true DGP has no pre-treatment differences (i.e., the identifying assumption holds on some scale), a pre-trends test on the wrong scale may falsely reject.

# Test on logit scale
pt_logit <- nonlinear_pretest(res_logit, plot = FALSE)
cat("Pre-trends test (logit scale):\n")
cat("  Joint p-value:", round(pt_logit$joint_pval, 4), "\n\n")

# Test on linear scale
pt_linear <- nonlinear_pretest(res_linear, plot = FALSE)
cat("Pre-trends test (linear scale):\n")
cat("  Joint p-value:", round(pt_linear$joint_pval, 4), "\n\n")

cat("Note: If true DGP is logit-scale parallel trends, the linear-scale\n")
cat("pre-trends test may spuriously reject due to functional form.\n")

Recommendations

Based on Roth & Sant'Anna (2023) and the evidence above:

  1. Think about your DGP first. If your outcome is binary with moderate baseline rates (15–85%), use a nonlinear model.

  2. Report the scale of your parallel trends assumption. "We assume parallel trends in log-odds" is a substantively different claim from "parallel trends in probabilities."

  3. Use doubly-robust estimation (doubly_robust = TRUE) which is consistent under misspecification of either the outcome model or the propensity score model.

  4. Consider the odds-ratio DiD for binary outcomes when you want an estimate that does not depend on the reference group or period.

  5. Use nonlinear_bounds() to report the range of ATTs consistent with the data under minimal assumptions.


References

Roth, J., & Sant'Anna, P. H. C. (2023). When is parallel trends sensitive to functional form? Econometrica, 91(2), 737-747.

Wooldridge, J. M. (2023). Simple approaches to nonlinear difference-in-differences with panel data. The Econometrics Journal, 26(3).



Try the NonlinearDiD package in your browser

Any scripts or data that you put into this service are public.

NonlinearDiD documentation built on May 6, 2026, 1:06 a.m.