MIdiagRDD: Diagnostic Tool by Multiple Imputation for Regression...

View source: R/MIRDD.R

MIdiagRDDR Documentation

Diagnostic Tool by Multiple Imputation for Regression Discontinuity Designs

Description

Estimates average treatment effects at the cutoff based on sharp regression discontinuity designs (RDD) and multiple imputation regression discontinuity designs (MIRDD). It provides diagnostic tools for RDD by comparing results with those from MIRDD.

Usage

MIdiagRDD(
  y,
  x,
  cut,
  seed = NULL,
  M1 = 100,
  M2 = 5,
  M3 = 1,
  p2s1 = 1,
  emp = 0,
  bw = "mserd",
  ker = "triangular",
  h = NULL,
  type = "Conventional",
  p1 = 1,
  conf = 95,
  upper = 1,
  covs1 = NULL,
  up = NULL,
  lo = NULL
)

Arguments

y

A numeric vector of the outcome variable.

x

A numeric vector of the running variable (forcing variable).

cut

A numeric value indicating the cutoff point in x. The user must supply a specific number.

seed

A seed number for reproducibility. Default is NULL.

M1

Number of imputations for MIRDD. Default is 100.

M2

Number of imputations for visualization (plots 3, 4, 9, and 10). Default is 5. These datasets are the subsets of M1 imputed datasets. Thus, M2 cannot be larger than M1.

M3

Number of imputed datasets for plots 5 to 10. Default is 1. These datasets are the subsets of M1 imputed datasets. Thus, M3 cannot be larger than M1.

p2s1

Integer for Amelia's p2s argument (0 or 1), where 0 for no screen printing and 1 for screen printing of multiple imputation process. Default is 1.

emp

Amelia's empirical (ridge) prior. Default is 0. A reasonable upper bound is 0.1.

bw

Bandwidth selection method for rdrobust. Options are "mserd" (default), "msesum", "cerrd", and "cersum". "mserd" is one common MSE-optimal bandwidth selector. "msesum" is one common MSE-optimal bandwidth selector for the sum of regression estimates. "cerrd" is one common CER-optimal bandwidth selector. "cersum" is one common CER-optimal bandwidth selector for the sum of regression estimates. MSE is Mean Squared Error. CER is Coverage Error Rate.

ker

Kernel function for rdrobust. Options are "triangular" (default option), "epanechnikov", and "uniform".

h

Number for bandwidth. Default is NULL (data-driven).

type

Inference type: "Conventional" (default), "Bias-Corrected", or "Robust".

p1

Polynomial order (1 or 2) for rdrobust and MIRDD. Default is 1 (local linear regression). Can take either 1 (local linear regression) or 2 (local quadratic regression). When specified larger than 2, it will be considered 2.

conf

Confidence level (0-100). Default is 95.

upper

If 1 (default), treatment is x >= cut. If 0, treatment is x < cut.

covs1

Optional covariates. If two additional covariates z1 and z2 need to be used, then covs1 = data.frame(z1, z2).

up

Optional upper bound for imputed values.

lo

Optional lower bound for imputed values.

Value

Estimate

Estimated quantities of the average treatment effects (ATE) at the cutoff.

Std.Error

Standard error of the estimate.

CI.LL

Lower limit of the 95% confidence interval.

CI.UL

Upper limit of the 95% confidence interval.

size

Sub-sample size to estimate the ATE at the cutoff.

bandwidth

Length of the bandwidth used for RDD analysis.

In addition to the data frame, a series of diagnostic plots are generated:

1. MIRDD, RDD, Naive

A diagnostic plot to visualize the relationship among the three estimators. Red vertical line is RDD, black solid line is naive, and histogram is MI.

2. MIRDD and RDD

A diagnostic plot to visualize the relationship between the two estimators. Red vertical line is RDD and histogram is MI.

3. Densities (Control)

A diagnostic plot to visualize the densities of observed and imputed data. Gray solid curve is the density of observed data in the control group. Blue solid curve is the density of observed data in the treatment group. Red dashed lines are the densities of imputed data in the control group.

4. Densities (Treatment)

A diagnostic plot to visualize the densities of observed and imputed data. Gray solid curve is the density of observed data in the control group. Blue solid curve is the density of observed data in the treatment group. Red dashed lines are the densities of imputed data in the treatment group.

5. Observed Values

A diagnostic plot to visualize the scatterplot of observed data. Gray circles are observed data in the control group. Blue triangles are observed data in the treatment group.

6. Observed & Imputed Values

A diagnostic plot to visualize the scatterplot of observed and imputed data. Red circles are imputed data in the control group. Red triangles are imputed data in the treatment group. These imputed data are overlaid on the observed data in Figure 5.

7. Observed & Imputed (Control)

A diagnostic plot to clearly visualize the scatterplot of observed and imputed data in the control group only.

8. Observed & Imputed (Treatment)

A diagnostic plot to clearly visualize the scatterplot of observed and imputed data in the treatment group only.

9. Around Cutoff (Control)

A diagnostic plot to clearly visualize the scatterplot, around the cutoff point, of observed and imputed data in the control group only. Five solid lines are the estimated linear regression lines based on multiply imputed data.

10. Around Cutoff (Treatment)

A diagnostic plot to clearly visualize the scatterplot, around the cutoff point, of observed and imputed data in the treatment group only. Five solid lines are the estimated linear regression lines based on multiply imputed data.

11. Local Slope (Control)

A diagnostic plot to visualize the distribution of the coefficients of the estimated linear regression models around the cutoff point in the control group.

12. Local Slope (Treatment)

A diagnostic plot to visualize the distribution of the coefficients of the estimated linear regression models around the cutoff point in the treatment group.

References

Takahashi, M. 2023. Multiple imputation regression discontinuity designs: Alternative to regression discontinuity designs to estimate the local average treatment effect at the cutoff. Communications in Statistics - Simulation and Computation 53(9): 4293-4312. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/03610918.2021.1960374")}

Takahashi, M. 2026. MIRDD: An R package for multiple imputation regression discontinuity design. SoftwareX 34(102707): 1-6. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.softx.2026.102707")}

Calonico, S., Cattaneo, M.D., and Titiunik, R. 2015. rdrobust: An R Package for robust nonparametric inference in regression-discontinuity designs. R Journal 7(1): 38-51. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.32614/RJ-2015-004")}

Honaker, J., King, G., and Blackwell, M. 2011. Amelia II: A program for missing data. Journal of Statistical Software 45(7): 1-47. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v045.i07")}

Examples

# Example usage with dummy data
x <- runif(100, -1, 1)
y <- 0.5 * x + (x >= 0) + rnorm(100, 0, 0.1)
MIdiagRDD(y = y, x = x, cut = 0)

MIRDD documentation built on May 15, 2026, 9:07 a.m.