lpdensity | R Documentation |
lpdensity
implements the local polynomial regression based density (and derivatives)
estimator proposed in Cattaneo, Jansson and Ma (2020). Robust bias-corrected inference methods,
both pointwise (confidence intervals) and uniform (confidence bands), are also implemented
following the results in Cattaneo, Jansson and Ma (2020, 2023).
See Cattaneo, Jansson and Ma (2022) for more implementation details and illustrations.
Companion command: lpbwdensity
for bandwidth selection.
Related Stata
and R
packages useful for nonparametric estimation and inference are
available at https://nppackages.github.io/.
lpdensity(
data,
grid = NULL,
bw = NULL,
p = NULL,
q = NULL,
v = NULL,
kernel = c("triangular", "uniform", "epanechnikov"),
scale = NULL,
massPoints = TRUE,
bwselect = c("mse-dpi", "imse-dpi", "mse-rot", "imse-rot"),
stdVar = TRUE,
regularize = TRUE,
nLocalMin = NULL,
nUniqueMin = NULL,
Cweights = NULL,
Pweights = NULL
)
data |
Numeric vector or one dimensional matrix/data frame, the raw data. |
grid |
Numeric, specifies the grid of evaluation points. When set to default, grid points will be chosen as 0.05-0.95 percentiles of the data, with a step size of 0.05. |
bw |
Numeric, specifies the bandwidth
used for estimation. Can be (1) a positive scalar (common
bandwidth for all grid points); or (2) a positive numeric vector specifying bandwidths for
each grid point (should be the same length as |
p |
Nonnegative integer, specifies the order of the local polynomial used to construct point
estimates. (Default is |
q |
Nonnegative integer, specifies the order of the local polynomial used to construct
confidence intervals/bands (a.k.a. the bias correction order). Default is |
v |
Nonnegative integer, specifies the derivative of the distribution function to be estimated. |
kernel |
String, specifies the kernel function, should be one of |
scale |
Numeric, specifies how
estimates are scaled. For example, setting this parameter to 0.5 will scale down both the
point estimates and standard errors by half. Default is |
massPoints |
|
bwselect |
String, specifies the method for data-driven bandwidth selection. This option will be
ignored if |
stdVar |
|
regularize |
|
nLocalMin |
Nonnegative integer, specifies the minimum number of observations in each local neighborhood. This option
will be ignored if |
nUniqueMin |
Nonnegative integer, specifies the minimum number of unique observations in each local neighborhood. This option
will be ignored if |
Cweights |
Numeric, specifies the weights used for counterfactual distribution construction. Should have the same length as the data. |
Pweights |
Numeric, specifies the weights used in sampling. Should have the same length as the data. |
Bias correction is only used for the construction of confidence intervals/bands, but not for point
estimation. The point estimates, denoted by f_p
, are constructed using local polynomial estimates
of order p
, while the centering of the confidence intervals/bands, denoted by f_q
, are constructed
using local polynomial estimates of order q
. The confidence intervals/bands take the form:
[f_q - cv * SE(f_q) , f_q + cv * SE(f_q)]
, where cv
denotes the appropriate critical value and SE(f_q)
denotes an standard error estimate for the centering of the confidence interval/band. As a result,
the confidence intervals/bands may not be centered at the point estimates because they have been bias-corrected.
Setting q
and p
to be equal results on centered at the point estimate confidence intervals/bands,
but requires undersmoothing for valid inference (i.e., (I)MSE-optimal bandwdith for the density point estimator
cannot be used). Hence the bandwidth would need to be specified manually when q=p
, and the
point estimates will not be (I)MSE optimal. See Cattaneo, Jansson and Ma (2020, 2023) for details, and also
Calonico, Cattaneo, and Farrell (2018, 2022) for robust bias correction methods.
Sometimes the density point estimates may lie outside of the confidence intervals/bands, which can happen
if the underlying distribution exhibits high curvature at some evaluation point(s). One possible solution
in this case is to increase the polynomial order p
or to employ a smaller bandwidth.
Estimate |
A matrix containing (1) |
CovMat_p |
The variance-covariance matrix corresponding to |
CovMat_q |
The variance-covariance matrix corresponding to |
opt |
A list containing options passed to the function. |
Matias D. Cattaneo, Princeton University. cattaneo@princeton.edu.
Michael Jansson, University of California Berkeley. mjansson@econ.berkeley.edu.
Xinwei Ma (maintainer), University of California San Diego. x1ma@ucsd.edu.
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2018. On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference. Journal of the American Statistical Association, 113(522): 767-779. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2017.1285776")}
Calonico, S., M. D. Cattaneo, and M. H. Farrell. 2022. Coverage Error Optimal Confidence Intervals for Local Polynomial Regression. Bernoulli, 28(4): 2998-3022. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.3150/21-BEJ1445")}
Cattaneo, M. D., M. Jansson, and X. Ma. 2020. Simple Local Polynomial Density Estimators. Journal of the American Statistical Association, 115(531): 1449-1455. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/01621459.2019.1635480")}
Cattaneo, M. D., M. Jansson, and X. Ma. 2022. lpdensity: Local Polynomial Density Estimation and Inference. Journal of Statistical Software, 101(2): 1–25. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v101.i02")}
Cattaneo, M. D., M. Jansson, and X. Ma. 2023. Local Regression Distribution Estimators. Journal of Econometrics, 240(2): 105074. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jeconom.2021.01.006")}
Supported methods: coef.lpdensity
, confint.lpdensity
, plot.lpdensity
, print.lpdensity
, summary.lpdensity
, vcov.lpdensity
.
# Generate a random sample
set.seed(42); X <- rnorm(2000)
# Estimate density and report results
est1 <- lpdensity(data = X, bwselect = "imse-dpi")
summary(est1)
# Report results for a subset of grid points
summary(est1, grid=est1$Estimate[4:10, "grid"])
summary(est1, gridIndex=4:10)
# Report the 99% uniform confidence band
set.seed(42) # fix the seed for simulating critical values
summary(est1, alpha=0.01, CIuniform=TRUE)
# Plot the estimates and confidence intervals
plot(est1, legendTitle="My Plot", legendGroups=c("X"))
# Plot the estimates and the 99% uniform confidence band
set.seed(42) # fix the seed for simulating critical values
plot(est1, alpha=0.01, CIuniform=TRUE, legendTitle="My Plot", legendGroups=c("X"))
# Adding a histogram to the background
plot(est1, legendTitle="My Plot", legendGroups=c("X"),
hist=TRUE, histData=X, histBreaks=seq(-1.5, 1.5, 0.25))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.