sim_mediation: Simulated Mediation Study Data

sim_mediationR Documentation

Simulated Mediation Study Data

Description

A synthetic dataset mimicking a clustered education study with a continuous treatment (tutoring hours), a continuous mediator (mid-year test score), and a continuous outcome (end-of-year test score). Designed to illustrate RobustMediate with realistic effect sizes and non-trivial confounding.

Usage

sim_mediation

Format

A data frame with 600 rows (30 schools x 20 students) and 7 columns:

school

Factor. School identifier (30 levels). Use as cluster_var.

Y

Numeric. End-of-year test score (outcome).

X

Numeric. Tutoring hours received (continuous treatment, >= 0).

M

Numeric. Mid-year test score (mediator).

Z1

Numeric. Prior achievement (continuous covariate).

Z2

Integer (0/1). Free-lunch status (binary covariate).

Z3

Numeric. Parental education index (continuous covariate).

True parameter targets

The data-generating process sets:

  • NDE (X → Y direct path) ~= 0.25

  • NIE (X → M → Y path) ~= 0.35

  • TE ~= 0.60

  • % mediated ~= 58%

Use these as a ground truth to assess estimation accuracy.

Source

Generated via data-raw/generate_sim_data.R. See that script for the full data-generating process.

Examples

data(sim_mediation)
str(sim_mediation)
summary(sim_mediation[, c("Y","X","M")])


fit <- robustmediate(
  treatment_formula = X ~ Z1 + Z2 + Z3,
  mediator_formula  = M ~ X + Z1 + Z2 + Z3,
  outcome_formula   = Y ~ X + M + Z1 + Z2 + Z3,
  data        = sim_mediation,
  cluster_var = "school",
  R           = 500
)
diagnose(fit)


RobustMediate documentation built on April 16, 2026, 5:08 p.m.