rdplotdensity: Density Plotting for Manipulation Testing

View source: R/rdplotdensity.R

rdplotdensityR Documentation

Density Plotting for Manipulation Testing

Description

rdplotdensity constructs density plots. It is based on the local polynomial density estimator proposed in Cattaneo, Jansson and Ma (2020, 2023). A companion Stata package is described in Cattaneo, Jansson and Ma (2018).

Companion command: rddensity for manipulation (density discontinuity) testing.

Related Stata and R packages useful for inference in regression discontinuity (RD) designs are described in the website: https://rdpackages.github.io/.

Usage

rdplotdensity(
  rdd,
  X,
  plotRange = NULL,
  plotN = 10,
  plotGrid = c("es", "qs"),
  alpha = 0.05,
  type = NULL,
  lty = NULL,
  lwd = NULL,
  lcol = NULL,
  pty = NULL,
  pwd = NULL,
  pcol = NULL,
  CItype = NULL,
  CIuniform = FALSE,
  CIsimul = 2000,
  CIshade = NULL,
  CIcol = NULL,
  bwselect = NULL,
  hist = TRUE,
  histBreaks = NULL,
  histFillCol = 3,
  histFillShade = 0.2,
  histLineCol = "white",
  title = "",
  xlabel = "",
  ylabel = "",
  legendTitle = NULL,
  legendGroups = NULL,
  noPlot = FALSE
)

Arguments

rdd

Object returned by rddensity

X

Numeric vector or one dimensional matrix/data frame, the running variable.

plotRange

Numeric, specifies the lower and upper bound of the plotting region. Default is [c-3*hl,c+3*hr] (three bandwidths around the cutoff).

plotN

Numeric, specifies the number of grid points used for plotting on the two sides of the cutoff. Default is c(10,10) (i.e., 10 points are used on each side).

plotGrid

String, specifies how the grid points are positioned. Options are es (evenly spaced) and qs (quantile spaced).

alpha

Numeric scalar between 0 and 1, the significance level for plotting confidence regions. If more than one is provided, they will be applied to the two sides accordingly.

type

String, one of "line" (default), "points" or "both", how the point estimates are plotted. If more than one is provided, they will be applied to the two sides accordingly.

lty

Line type for point estimates, only effective if type is "line" or "both". 1 for solid line, 2 for dashed line, 3 for dotted line. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.

lwd

Line width for point estimates, only effective if type is "line" or "both". Should be strictly positive. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.

lcol

Line color for point estimates, only effective if type is "line" or "both". 1 for black, 2 for red, 3 for green, 4 for blue. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.

pty

Scatter plot type for point estimates, only effective if type is "points" or "both". For options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.

pwd

Scatter plot size for point estimates, only effective if type is "points" or "both". Should be strictly positive. If more than one is provided, they will be applied to the two sides accordingly.

pcol

Scatter plot color for point estimates, only effective if type is "points" or "both". 1 for black, 2 for red, 3 for green, 4 for blue. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.

CItype

String, one of "region" (shaded region, default), "line" (dashed lines), "ebar" (error bars), "all" (all of the previous) or "none" (no confidence region), how the confidence region should be plotted. If more than one is provided, they will be applied to the two sides accordingly.

CIuniform

TRUE or FALSE (default), plotting either pointwise confidence intervals (FALSE) or uniform confidence bands (TRUE).

CIsimul

Positive integer, the number of simulations used to construct critical values (default is 2000). This option is ignored if CIuniform=FALSE.

CIshade

Numeric, opaqueness of the confidence region, should be between 0 (transparent) and 1. Default is 0.2. If more than one is provided, they will be applied to the two sides accordingly.

CIcol

Color of the confidence region. 1 for black, 2 for red, 3 for green, 4 for blue. For other options, see the instructions for ggplot2 or par. If more than one is provided, they will be applied to the two sides accordingly.

bwselect

String, the method for data-driven bandwidth selection. Available options are (1) "mse-dpi" (mean squared error-optimal bandwidth selected for each grid point); (2) "imse-dpi" (integrated MSE-optimal bandwidth, common for all grid points); (3) "mse-rot" (rule-of-thumb bandwidth with Gaussian reference model); and (4) "imse-rot" (integrated rule-of-thumb bandwidth with Gaussian reference model). If omitted, bandwidths returned by rddensity will be used.

hist

TRUE (default) or FALSE, whether adding a histogram to the background.

histBreaks

Numeric vector, giving the breakpoints between histogram cells.

histFillCol

Color of the histogram cells.

histFillShade

Opaqueness of the histogram cells, should be between 0 (transparent) and 1. Default is 0.2.

histLineCol

Color of the histogram lines.

title, xlabel, ylabel

Strings, title of the plot and labels for x- and y-axis.

legendTitle

String, title of legend.

legendGroups

String Vector, group names used in legend.

noPlot

No density plot will be generated if set to TRUE.

Details

Bias correction is only used for the construction of confidence intervals/bands, but not for point estimation. The point estimates, denoted by f_p, are constructed using local polynomial estimates of order p, while the centering of the confidence intervals/bands, denoted by f_q, are constructed using local polynomial estimates of order q. The confidence intervals/bands take the form: [f_q - cv * SE(f_q) , f_q + cv * SE(f_q)], where cv denotes the appropriate critical value and SE(f_q) denotes a standard error estimate for the centering of the confidence interval/band. As a result, the confidence intervals/bands may not be centered at the point estimates because they have been bias-corrected. Setting q and p to be equal results on centered at the point estimate confidence intervals/bands, but requires undersmoothing for valid inference (i.e., (I)MSE-optimal bandwdith for the density point estimator cannot be used). Hence the bandwidth would need to be specified manually when q=p, and the point estimates will not be (I)MSE optimal. See Cattaneo, Jansson and Ma (2022, 2023) for details, and also Calonico, Cattaneo, and Farrell (2018, 2022) for robust bias correction methods.

Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen if the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution in this case is to increase the polynomial order p or to employ a smaller bandwidth.

Value

Estl, Estr

Matrices containing estimation results: (1) grid (grid points), (2) bw (bandwidths), (3) nh (number of observations in each local neighborhood), (4) nhu (number of unique observations in each local neighborhood), (5) f_p (point estimates with p-th order local polynomial), (6) f_q (point estimates with q-th order local polynomial, only if option q is nonzero), (7) se_p (standard error corresponding to f_p), and (8) se_q (standard error corresponding to f_q). Variance-covariance matrix corresponding to f_p. Variance-covariance matrix corresponding to f_q. A list containing options passed to the function.

Estplot

A stadnard ggplot object is returned, hence can be used for further customization.

Author(s)

Matias D. Cattaneo, Princeton University cattaneo@princeton.edu.

Michael Jansson, University of California Berkeley. mjansson@econ.berkeley.edu.

Xinwei Ma (maintainer), University of California San Diego. x1ma@ucsd.edu.

References

Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association 113(522): 767-779. doi: 10.1080/01621459.2017.1285776

Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. doi: 10.3150/21-BEJ1445

Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. Stata Journal 18(1): 234-261. doi: 10.1177/1536867X1801800115

Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. doi: 10.1080/01621459.2019.1635480

Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2), 1–25. doi: 10.18637/jss.v101.i02

Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, forthcoming. doi: 10.1016/j.jeconom.2021.01.006

See Also

rddensity

Examples

# Generate a random sample with a density discontinuity at 0
set.seed(42)
x <- rnorm(2000, mean = -0.5)
x[x > 0] <- x[x > 0] * 2

# Estimation
rdd <- rddensity(X = x)
summary(rdd)

# Density plot (from -2 to 2 with 25 evaluation points at each side)
plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25)

# Plotting a uniform confidence band
set.seed(42) # fix the seed for simulating critical values
plot3 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE)


rddensity documentation built on Jan. 22, 2023, 1:26 a.m.