Nothing
################################################################################
#' @title Density Plotting for Manipulation Testing
#'
#' @description \code{rdplotdensity} constructs density plots. It is based on the
#' local polynomial density estimator proposed in Cattaneo, Jansson and Ma (2020, 2023).
#' A companion \code{Stata} package is described in Cattaneo, Jansson and Ma (2018).
#'
#' Companion command: \code{\link{rddensity}} for manipulation (density discontinuity) testing.
#'
#' Related Stata and R packages useful for inference in regression discontinuity (RD)
#' designs are described in the website: \url{https://rdpackages.github.io/}.
#'
#' @details
#' Bias correction is only used for the construction of confidence intervals/bands, but not for point
#' estimation. The point estimates, denoted by \code{f_p}, are constructed using local polynomial estimates of order
#' \code{p}, while the centering of the confidence intervals/bands, denoted by \code{f_q}, are constructed using local
#' polynomial estimates of order \code{q}. The confidence intervals/bands take the form:
#' \code{[f_q - cv * SE(f_q) , f_q + cv * SE(f_q)]}, where \code{cv} denotes the appropriate critical value and
#' \code{SE(f_q)} denotes a standard error estimate
#' for the centering of the confidence interval/band. As a result, the confidence intervals/bands may not be
#' centered at the point estimates because they have been bias-corrected. Setting \code{q} and \code{p} to be equal
#' results on centered at the point estimate confidence intervals/bands, but requires undersmoothing for valid
#' inference (i.e., (I)MSE-optimal bandwdith for the density point estimator cannot be used). Hence the bandwidth
#' would need to be specified manually when \code{q=p}, and the point estimates will not be (I)MSE optimal. See
#' Cattaneo, Jansson and Ma (2022, 2023) for details, and also Calonico, Cattaneo, and Farrell (2018, 2022) for
#' robust bias correction methods.
#'
#' Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen if
#' the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution in this
#' case is to increase the polynomial order \code{p} or to employ a smaller bandwidth.
#'
#' @param rdd Object returned by \code{\link{rddensity}}
#' @param X Numeric vector or one dimensional matrix/data frame, the running variable.
#' @param plotRange Numeric, specifies the lower and upper bound of the plotting region. Default is
#' \code{[c-3*hl,c+3*hr]} (three bandwidths around the cutoff).
#' @param plotN Numeric, specifies the number of grid points used for plotting on the two sides of the cutoff.
#' Default is \code{c(10,10)} (i.e., 10 points are used on each side).
#' @param plotGrid String, specifies how the grid points are positioned. Options are \code{es} (evenly spaced)
#' and \code{qs} (quantile spaced).
#' @param alpha Numeric scalar between 0 and 1, the significance level for plotting
#' confidence regions. If more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param type String, one of \code{"line"} (default), \code{"points"} or \code{"both"}, how
#' the point estimates are plotted. If more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param lty Line type for point estimates, only effective if \code{type} is \code{"line"} or
#' \code{"both"}. \code{1} for solid line, \code{2} for dashed line, \code{3} for dotted line.
#' For other options, see the instructions for \code{\link{ggplot2}} or \code{\link{par}}. If
#' more than one is provided, they will be applied to the two sides accordingly.
#' @param lwd Line width for point estimates, only effective if \code{type} is \code{"line"} or
#' \code{"both"}. Should be strictly positive. For other options, see the instructions for
#' \code{\link{ggplot2}} or \code{\link{par}}. If more than one is provided, they will be applied
#' to the two sides accordingly.
#' @param lcol Line color for point estimates, only effective if \code{type} is \code{"line"} or
#' \code{"both"}. \code{1} for black, \code{2} for red, \code{3} for green, \code{4} for blue.
#' For other options, see the instructions for \code{\link{ggplot2}} or \code{\link{par}}. If
#' more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param pty Scatter plot type for point estimates, only effective if \code{type} is \code{"points"} or
#' \code{"both"}. For options, see the instructions for \code{\link{ggplot2}} or \code{\link{par}}. If
#' more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param pwd Scatter plot size for point estimates, only effective if \code{type} is \code{"points"} or
#' \code{"both"}. Should be strictly positive. If more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param pcol Scatter plot color for point estimates, only effective if \code{type} is \code{"points"} or
#' \code{"both"}. \code{1} for black, \code{2} for red, \code{3}
#' for green, \code{4} for blue.
#' For other options, see the instructions for \code{\link{ggplot2}} or \code{\link{par}}. If
#' more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param CItype String, one of \code{"region"} (shaded region, default), \code{"line"} (dashed lines),
#' \code{"ebar"} (error bars), \code{"all"} (all of the previous) or \code{"none"} (no confidence region),
#' how the confidence region should be plotted. If more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param CIuniform \code{TRUE} or \code{FALSE} (default), plotting either pointwise confidence intervals (\code{FALSE}) or
#' uniform confidence bands (\code{TRUE}).
#' @param CIsimul Positive integer, the number of simulations used to construct critical values (default is 2000). This
#' option is ignored if \code{CIuniform=FALSE}.
#' @param CIshade Numeric, opaqueness of the confidence region, should be between 0 (transparent) and
#' 1. Default is 0.2. If more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param CIcol Color of the confidence region. \code{1} for black, \code{2} for red, \code{3}
#' for green, \code{4} for blue.
#' For other options, see the instructions for \code{\link{ggplot2}} or \code{\link{par}}. If
#' more than one is provided, they will be applied to the two sides
#' accordingly.
#' @param bwselect String, the method for data-driven bandwidth selection. Available options
#' are (1) \code{"mse-dpi"} (mean squared error-optimal bandwidth selected for each grid point);
#' (2) \code{"imse-dpi"} (integrated MSE-optimal bandwidth, common for all grid points);
#' (3) \code{"mse-rot"} (rule-of-thumb bandwidth with Gaussian reference model);
#' and (4) \code{"imse-rot"} (integrated rule-of-thumb bandwidth with Gaussian reference model).
#' If omitted, bandwidths returned by \code{rddensity} will be used.
#' @param hist \code{TRUE} (default) or \code{FALSE}, whether adding a histogram to the background.
#' @param histBreaks Numeric vector, giving the breakpoints between histogram cells.
#' @param histFillCol Color of the histogram cells.
#' @param histFillShade Opaqueness of the histogram cells, should be between 0 (transparent) and
#' 1. Default is 0.2.
#' @param histLineCol Color of the histogram lines.
#' @param title,xlabel,ylabel Strings, title of the plot and labels for x- and y-axis.
#' @param legendTitle String, title of legend.
#' @param legendGroups String Vector, group names used in legend.
#' @param noPlot No density plot will be generated if set to \code{TRUE}.
#'
#' @return
#' \item{Estl, Estr}{Matrices containing estimation results:
#' (1) \code{grid} (grid points),
#' (2) \code{bw} (bandwidths),
#' (3) \code{nh} (number of observations in each local neighborhood),
#' (4) \code{nhu} (number of unique observations in each local neighborhood),
#' (5) \code{f_p} (point estimates with p-th order local polynomial),
#' (6) \code{f_q} (point estimates with q-th order local polynomial, only if option \code{q} is nonzero),
#' (7) \code{se_p} (standard error corresponding to \code{f_p}), and (8) \code{se_q} (standard error
#' corresponding to \code{f_q}).
#' Variance-covariance matrix corresponding to \code{f_p}.
#' Variance-covariance matrix corresponding to \code{f_q}.
#' A list containing options passed to the function.}
#' \item{Estplot}{A stadnard \code{ggplot} object is returned, hence can be used for further customization.}
#'
#' @author
#' Matias D. Cattaneo, Princeton University \email{cattaneo@princeton.edu}.
#'
#' Michael Jansson, University of California Berkeley. \email{mjansson@econ.berkeley.edu}.
#'
#' Xinwei Ma (maintainer), University of California San Diego. \email{x1ma@ucsd.edu}.
#'
#' @references
#' Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. \emph{Journal of the American Statistical Association} 113(522): 767-779. \doi{10.1080/01621459.2017.1285776}
#'
#' Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. \emph{Bernoulli}, 28(4): 2998-3022. \doi{10.3150/21-BEJ1445}
#'
#' Cattaneo, M. D., M. Jansson, and X. Ma. 2018. Manipulation Testing based on Density Discontinuity. \emph{Stata Journal} 18(1): 234-261. \doi{10.1177/1536867X1801800115}
#'
#' Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. \emph{Journal of the American Statistical Association}, 115(531): 1449-1455. \doi{10.1080/01621459.2019.1635480}
#'
#' Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. \emph{Journal of Statistical Software}, 101(2), 1–25. \doi{10.18637/jss.v101.i02}
#'
#' Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. \emph{Journal of Econometrics}, forthcoming. \doi{10.1016/j.jeconom.2021.01.006}
#'
#' @seealso \code{\link{rddensity}}
#'
#' @examples
#' # Generate a random sample with a density discontinuity at 0
#' set.seed(42)
#' x <- rnorm(2000, mean = -0.5)
#' x[x > 0] <- x[x > 0] * 2
#'
#' # Estimation
#' rdd <- rddensity(X = x)
#' summary(rdd)
#'
#' # Density plot (from -2 to 2 with 25 evaluation points at each side)
#' plot1 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25)
#'
#' # Plotting a uniform confidence band
#' set.seed(42) # fix the seed for simulating critical values
#' plot3 <- rdplotdensity(rdd, x, plotRange = c(-2, 2), plotN = 25, CIuniform = TRUE)
#'
#' @export
rdplotdensity <- function(rdd, X, plotRange = NULL, plotN = 10, plotGrid = c("es", "qs"),
alpha = 0.05,
type = NULL,
lty = NULL, lwd = NULL, lcol = NULL,
pty = NULL, pwd = NULL, pcol = NULL,
CItype = NULL,
CIuniform=FALSE, CIsimul=2000, CIshade = NULL, CIcol = NULL,
bwselect = NULL,
hist=TRUE, histBreaks=NULL, histFillCol=3, histFillShade=0.2, histLineCol="white",
title = "", xlabel = "", ylabel = "", legendTitle = NULL, legendGroups = NULL,
noPlot = FALSE){
# obtain options from rddensity result
c <- rdd$opt$c
p <- rdd$opt$p
q <- rdd$opt$q
hl <- rdd$h$left
hr <- rdd$h$right
kernel <- rdd$opt$kernel
regularize <- rdd$opt$regularize
nLocalMin <- rdd$opt$nLocalMin
nUniqueMin <- rdd$opt$nUniqueMin
massPoints <- rdd$opt$massPoints
################################################################################
# missing value handling
################################################################################
X <- as.vector(X)
if (any(is.na(X))) {
warning(paste(sum(is.na(X)), " missing ", switch((sum(is.na(X))>1)+1, "observation is", "observations are"), " ignored.\n", sep=""))
X <- X[!is.na(X)]
}
# check grid specifications
if (length(plotRange) == 0) {
plotRange <- c( max(min(X), c - 3*hl), min(max(X), c + 3 * hr) )
} else if (length(plotRange) != 2) {
stop("Plot range incorrectly specified.\n")
} else if (plotRange[1] >= c | plotRange[2] <= c) {
stop("Plot range incorrectly specified.\n")
}
if (length(plotN) == 0) {
plotN <- c(10, 10)
} else if (length(plotN) == 1) {
plotN <- c(plotN, plotN)
} else if (length(plotN) > 2) {
stop("Number of grid points incorrectly specified.\n")
}
if (plotN[1] <=1 | plotN[2] <=1) {
stop("Number of grid points incorrectly specified.\n")
}
if (length(plotGrid) == 0) {
plotGrid <- "es"
} else {
plotGrid <- plotGrid[1]
}
if (!plotGrid%in%c("es", "qs")) {
stop("Grid specification invalid.\n")
}
if (hist & is.null(histBreaks)) {
temp_hist_n_l <- sum(X >= plotRange[1] & X < c)
temp_hist_n_l <- ceiling(min(sqrt(temp_hist_n_l), 10 * log(temp_hist_n_l)/log(10)))
temp_hist_n_r <- sum(X <= plotRange[2] & X >= c)
temp_hist_n_r <- ceiling(min(sqrt(temp_hist_n_r), 10 * log(temp_hist_n_r)/log(10)))
histBreaks <- c(seq(plotRange[1], c, length.out = temp_hist_n_l+1), seq(c, plotRange[2], length.out = temp_hist_n_r+1)[2:(temp_hist_n_r+1)])
}
# some preparation
scalel <- (sum(X <= c) - 1) / (length(X) - 1)
scaler <- (sum(X >= c) - 1) / (length(X) - 1)
if (plotGrid == "es") {
gridl <- seq(plotRange[1], c, length.out=plotN[1])
gridl[plotN[1]] <- c
gridr <- seq(c, plotRange[2], length.out=plotN[2])
gridr[1] <- c
} else {
gridl <- seq(mean(X <= plotRange[1]), mean(X <= c), length.out=plotN[1])
gridl <- quantile(X, gridl)
gridr <- seq(mean(X <= c), mean(X <= plotRange[2]), length.out=plotN[2])
gridr <- quantile(X, gridr)
gridl[plotN[1]] <- c
gridr[1] <- c
}
# call lpdensity
if (!is.null(bwselect)) {
if (bwselect%in%c("mse-dpi", "imse-dpi", "mse-rot", "imse-rot")) {
Estl <- lpdensity(data=X[X<=c], grid=gridl, bwselect=bwselect, p=p, q=q, v=1, kernel=kernel, scale=scalel,
regularize=regularize, nLocalMin=nLocalMin, nUniqueMin=nUniqueMin, massPoints=massPoints)
Estr <- lpdensity(data=X[X>=c], grid=gridr, bwselect=bwselect, p=p, q=q, v=1, kernel=kernel, scale=scaler,
regularize=regularize, nLocalMin=nLocalMin, nUniqueMin=nUniqueMin, massPoints=massPoints)
} else {
stop("Option bwselect incorrectly specified.\n")
}
} else {
Estl <- lpdensity(data=X[X<=c], grid=gridl, bw=hl, p=p, q=q, v=1, kernel=kernel, scale=scalel,
regularize=regularize, nLocalMin=nLocalMin, nUniqueMin=nUniqueMin, massPoints=massPoints)
Estr <- lpdensity(data=X[X>=c], grid=gridr, bw=hr, p=p, q=q, v=1, kernel=kernel, scale=scaler,
regularize=regularize, nLocalMin=nLocalMin, nUniqueMin=nUniqueMin, massPoints=massPoints)
}
# call lpdensity.plot
Estplot <- lpdensity.plot(Estl, Estr, alpha = alpha,
type = type,
lty = lty, lwd = lwd,
lcol = lcol, pty = pty, pwd = pwd, pcol = pcol,
CItype = CItype, CIuniform=CIuniform, CIsimul=CIsimul, CIshade = CIshade, CIcol = CIcol,
hist=hist, histData=X, histBreaks=histBreaks, histFillCol=histFillCol, histFillShade=histFillShade, histLineCol=histLineCol,
title = title, xlabel = xlabel, ylabel = ylabel, legendTitle = legendTitle, legendGroups = legendGroups) +
theme(legend.position = "none")
if (!noPlot) { print(Estplot) }
return(list(Estl=Estl, Estr=Estr, Estplot=Estplot))
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.