| npcdens | R Documentation |
npcdens computes kernel conditional density estimates on
p+q-variate evaluation data, given a set of training data (both
explanatory and dependent) and a bandwidth specification (a
conbandwidth object or a bandwidth vector, bandwidth type, and
kernel type) using the method of Hall, Racine, and Li (2004).
The data may be continuous, discrete (unordered and ordered
factors), or some combination thereof.
npcdens(bws, ...)
## S3 method for class 'formula'
npcdens(bws, data = NULL, newdata = NULL, ...)
## S3 method for class 'conbandwidth'
npcdens(bws,
txdat = stop("invoked without training data 'txdat'"),
tydat = stop("invoked without training data 'tydat'"),
exdat,
eydat,
gradients = FALSE,
proper = FALSE,
proper.method = c("project"),
proper.control = list(),
...)
## Default S3 method:
npcdens(bws, txdat, tydat, nomad = FALSE, ...)
These arguments identify the bandwidth specification, formula/data interface, and training data.
bws |
a bandwidth specification. This can be set as a |
data |
an optional data frame, list or environment (or object
coercible to a data frame by |
txdat |
a |
tydat |
a |
This argument passes the recommended automatic local-polynomial NOMAD preset to npcdensbw when bandwidths are computed inside npcdens.
nomad |
logical shortcut passed through to |
These arguments control where the fitted conditional density is evaluated and which estimates are returned.
exdat |
a |
eydat |
a |
gradients |
a logical value specifying whether to return estimates of the
gradients at the evaluation points. Defaults to |
newdata |
An optional data frame in which to look for evaluation data. If omitted, the training data are used. |
These arguments control optional post-estimation properization of the fitted conditional density.
proper |
a logical value specifying whether to apply post-estimation
properization to the conditional density estimate. Defaults to
|
proper.control |
a list of controls for properization. Supported entries are
|
proper.method |
a character string specifying the properization method. Currently
|
Further arguments are passed to npcdensbw when bandwidths are computed internally, or used to interpret a numeric bws vector.
... |
additional arguments supplied to |
Documentation guide: see npcdensbw for bandwidth selection and search controls, np.kernels for kernels, np.options for global options, and plot, plot.np for plotting options.
When bws is omitted, the formula and default methods call
npcdensbw first and pass bandwidth-selection arguments
from ... to that call. When bws is already a
conbandwidth object, npcdens estimates with the stored
bandwidth metadata in that object.
Argument groups for bandwidth selection are documented on
npcdensbw. The most common workflow is to choose data
and bandwidth inputs first, then bandwidth criterion and
representation, then kernel/support controls, numerical search
controls, bounded cv.ls quadrature controls if relevant, and
finally local-polynomial/NOMAD controls for polynomial-adaptive fits.
For S3 plotting help, use methods("plot") and query
class-specific help topics such as ?plot.npregression and
?plot.rbandwidth. You can inspect implementations with
getS3method("plot","npregression").
npcdens implements a variety of methods for estimating
multivariate conditional distributions (p+q-variate) defined
over a set of possibly continuous and/or discrete (unordered, ordered)
data. The approach is based on Li and Racine (2004) who employ
‘generalized product kernels’ that admit a mix of continuous
and discrete data types.
Three classes of kernel estimators for the continuous data types are
available: fixed, adaptive nearest-neighbor, and generalized
nearest-neighbor. Adaptive nearest-neighbor bandwidths change with
each sample realization in the set, x_i, when estimating
the density at the point x. Generalized nearest-neighbor
bandwidths change with the point at which the density is estimated,
x. Fixed bandwidths are constant over the support of x.
Training and evaluation input data may be a
mix of continuous (default), unordered discrete (to be specified in
the data frames using factor), and ordered discrete (to be
specified in the data frames using ordered). Data can be
entered in an arbitrary order and data types will be detected
automatically by the routine (see np for details).
A variety of kernels may be specified by the user. Kernels implemented for continuous data types include the second, fourth, sixth, and eighth order Gaussian and Epanechnikov kernels, and the uniform kernel. Unordered discrete data types use a variation on Aitchison and Aitken's (1976) kernel, while ordered data types use a variation of the Wang and van Ryzin (1981) kernel.
For practitioners who want the recommended automatic LP NOMAD route
without spelling out all LP tuning arguments,
npcdens(..., nomad=TRUE) and npcdensbw(..., nomad=TRUE)
expand missing settings to the same documented preset. Explicit
incompatible settings fail fast rather than being silently rewritten.
npcdens returns a condensity object. The generic
accessor functions fitted, se, and
gradients, extract estimated values, asymptotic standard
errors on estimates, and gradients, respectively, from the returned
object. Furthermore, the functions predict,
summary and plot support objects of both
classes. The returned objects have the following components:
xbw |
bandwidth(s), scale factor(s) or nearest neighbours for the
explanatory data, |
ybw |
bandwidth(s), scale factor(s) or nearest neighbours for the
dependent data, |
xeval |
the evaluation points of the explanatory data |
yeval |
the evaluation points of the dependent data |
condens |
estimates of the conditional density at the evaluation points |
conderr |
standard errors of the conditional density estimates |
congrad |
if invoked with |
congerr |
if invoked with |
log_likelihood |
log likelihood of the conditional density estimate |
If you are using data of mixed types, then it is advisable to use the
data.frame function to construct your input data and not
cbind, since cbind will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 1015-1026.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Scott, D.W. (1992), Multivariate Density Estimation. Theory, Practice and Visualization, New York: Wiley.
Silverman, B.W. (1986), Density Estimation, London: Chapman and Hall.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
np.kernels, np.options, plot, plot.np
npudens
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we load Giovanni
# Baiocchi's Italian GDP panel (see Italy for details), and compute the
# likelihood cross-validated bandwidths (default) using a second-order
# Gaussian kernel (default). Note - this may take a minute or two
# depending on the speed of your computer.
data("Italy")
Italy <- Italy[seq_len(min(300, nrow(Italy))), ]
attach(Italy)
# First, compute the bandwidths... note that this may take a minute or
# two depending on the speed of your computer.
bw <- npcdensbw(formula=gdp~ordered(year), nmulti=1)
# Next, compute the condensity object...
fhat <- npcdens(bws=bw)
# The object fhat now contains results such as the estimated conditional
# density function (fhat$condens) and so on...
summary(fhat)
# Call the plot() function to visualize the results (<ctrl>-C will
# interrupt on *NIX systems, <esc> will interrupt on MS Windows
# systems).
if (interactive()) plot(bw)
detach(Italy)
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we load Giovanni
# Baiocchi's Italian GDP panel (see Italy for details), and compute the
# likelihood cross-validated bandwidths (default) using a second-order
# Gaussian kernel (default). Note - this may take a minute or two
# depending on the speed of your computer.
data("Italy")
Italy <- Italy[seq_len(min(300, nrow(Italy))), ]
attach(Italy)
# First, compute the bandwidths... note that this may take a minute or
# two depending on the speed of your computer.
# Note - we cast `X' and `y' as data frames so that plot() can
# automatically grab names (this looks like overkill, but in
# multivariate settings you would do this anyway, so may as well get in
# the habit).
X <- data.frame(year=ordered(year))
y <- data.frame(gdp)
bw <- npcdensbw(xdat=X, ydat=y, nmulti=1)
# Next, compute the condensity object...
fhat <- npcdens(bws=bw)
# The object fhat now contains results such as the estimated conditional
# density function (fhat$condens) and so on...
summary(fhat)
# Call the plot() function to visualize the results (<ctrl>-C will
# interrupt on *NIX systems, <esc> will interrupt on MS Windows systems).
if (interactive()) plot(bw)
detach(Italy)
# EXAMPLE 2 (INTERFACE=FORMULA): For this example, we load the old
# faithful geyser data from the R `datasets' library and compute the
# conditional density function.
library("datasets")
data("faithful")
attach(faithful)
# Note - this may take a few minutes depending on the speed of your
# computer...
bw <- npcdensbw(formula=eruptions~waiting, nmulti=1)
summary(bw)
# Plot the density function (<ctrl>-C will interrupt on *NIX systems,
# <esc> will interrupt on MS Windows systems).
if (interactive()) plot(bw)
detach(faithful)
# EXAMPLE 2 (INTERFACE=DATA FRAME): For this example, we load the old
# faithful geyser data from the R `datasets' library and compute the
# conditional density function.
library("datasets")
data("faithful")
attach(faithful)
# Note - this may take a few minutes depending on the speed of your
# computer...
# Note - we cast `X' and `y' as data frames so that plot() can
# automatically grab names (this looks like overkill, but in
# multivariate settings you would do this anyway, so may as well get in
# the habit).
X <- data.frame(waiting)
y <- data.frame(eruptions)
bw <- npcdensbw(xdat=X, ydat=y, nmulti=1)
summary(bw)
# Plot the density function (<ctrl>-C will interrupt on *NIX systems,
# <esc> will interrupt on MS Windows systems)
if (interactive()) plot(bw)
detach(faithful)
# EXAMPLE 3 (INTERFACE=FORMULA): Replicate the DGP of Klein & Spady
# (1993) (see their description on page 405, pay careful attention to
# footnote 6 on page 405).
set.seed(123)
n <- 300
# x1 is chi-squared having 3 df truncated at 6 standardized by
# subtracting 2.348 and dividing by 1.511
x <- rchisq(n, df=3)
x1 <- (ifelse(x < 6, x, 6) - 2.348)/1.511
# x2 is normal (0, 1) truncated at +- 2 divided by 0.8796
x <- rnorm(n)
x2 <- ifelse(abs(x) < 2 , x, 2) / 0.8796
# y is 1 if y* > 0, 0 otherwise.
y <- ifelse(x1 + x2 + rnorm(n) > 0, 1, 0)
# Generate data-driven bandwidths (likelihood cross-validation). Note -
# this may take a few minutes depending on the speed of your computer...
bw <- npcdensbw(formula=factor(y)~x1+x2, nmulti=1)
# Next, create the evaluation data in order to generate a perspective
# plot
x1.seq <- seq(min(x1), max(x1), length=30)
x2.seq <- seq(min(x2), max(x2), length=30)
X.eval <- expand.grid(x1=x1.seq,x2=x2.seq)
data.eval <- data.frame(y=factor(rep(1, nrow(X.eval))),x1=X.eval[,1],x2=X.eval[,2])
# Now evaluate the conditional probability for y=1 and for the
# evaluation Xs
fit <- fitted(npcdens(bws=bw,newdata=data.eval))
# Finally, coerce the data into a matrix for plotting with persp()
fit.mat <- matrix(fit, 30, 30)
# Generate a perspective plot similar to Figure 2 b of Klein and Spady
# (1993)
persp(x1.seq,
x2.seq,
fit.mat,
col="white",
ticktype="detailed",
expand=0.5,
axes=FALSE,
box=FALSE,
main="Estimated Nonparametric Probability Perspective",
theta=310,
phi=25)
# EXAMPLE 3 (INTERFACE=DATA FRAME): Replicate the DGP of Klein & Spady
# (1993) (see their description on page 405, pay careful attention to
# footnote 6 on page 405).
set.seed(123)
n <- 300
# x1 is chi-squared having 3 df truncated at 6 standardized by
# subtracting 2.348 and dividing by 1.511
x <- rchisq(n, df=3)
x1 <- (ifelse(x < 6, x, 6) - 2.348)/1.511
# x2 is normal (0, 1) truncated at +- 2 divided by 0.8796
x <- rnorm(n)
x2 <- ifelse(abs(x) < 2 , x, 2) / 0.8796
# y is 1 if y* > 0, 0 otherwise.
y <- ifelse(x1 + x2 + rnorm(n) > 0, 1, 0)
# Create the X matrix
X <- cbind(x1, x2)
# Generate data-driven bandwidths (likelihood cross-validation). Note -
# this may take a few minutes depending on the speed of your computer...
bw <- npcdensbw(xdat=X, ydat=factor(y), nmulti=1)
# Next, create the evaluation data in order to generate a perspective
# plot
x1.seq <- seq(min(x1), max(x1), length=30)
x2.seq <- seq(min(x2), max(x2), length=30)
X.eval <- expand.grid(x1=x1.seq,x2=x2.seq)
# Now evaluate the conditional probability for y=1 and for the
# evaluation Xs
fit <- fitted(npcdens(exdat=X.eval,
eydat=factor(rep(1, nrow(X.eval))),
bws=bw))
# Finally, coerce the data into a matrix for plotting with persp()
fit.mat <- matrix(fit, 30, 30)
# Generate a perspective plot similar to Figure 2 b of Klein and Spady
# (1993)
persp(x1.seq,
x2.seq,
fit.mat,
col="white",
ticktype="detailed",
expand=0.5,
axes=FALSE,
box=FALSE,
main="Estimated Nonparametric Probability Perspective",
theta=310,
phi=25)
# EXAMPLE 4: Variations on local polynomial conditional density
# estimation with proper = TRUE.
data("Italy")
Italy2 <- within(Italy, {
year <- as.numeric(as.character(year))
})
# Plot only: make the plotted surface proper on the plot evaluation grid.
fhat <- npcdens(gdp ~ year, data = Italy2,
regtype = "lp", degree = 3, nmulti = 1)
plot(fhat, proper = TRUE)
# Fit an object whose fitted values are themselves proper.
ctrl_fit <- list(
mode = "slice",
apply = "fitted",
slice.grid.size = 101L,
slice.extend.factor = 0.1
)
fhat_fit <- npcdens(
gdp ~ year,
data = Italy2,
regtype = "lp",
degree = 3,
nmulti = 1,
proper = TRUE,
proper.control = ctrl_fit
)
fit_proper <- fitted(fhat_fit)
fit_raw <- fhat_fit$condens.raw
# Display the repaired and raw fitted values for cases where the raw
# fitted density is negative.
head(cbind(fit_proper, fit_raw)[which(fit_raw < 0), ])
# Predict on a common explicit y-grid for several years, and render
# those predictions proper.
g.grid <- seq(min(Italy2$gdp), max(Italy2$gdp), length.out = 200)
nd_grid <- expand.grid(
gdp = g.grid,
year = c(1955, 1975, 1995)
)
pred_grid <- predict(fhat, newdata = nd_grid, proper = TRUE)
# Predict on paired rows with different gdp grids by year, and still
# make the predictions proper via slice mode.
g1 <- seq(quantile(Italy2$gdp, 0.10),
quantile(Italy2$gdp, 0.60), length.out = 60)
g2 <- seq(quantile(Italy2$gdp, 0.30),
quantile(Italy2$gdp, 0.90), length.out = 35)
nd_slice <- rbind(
data.frame(gdp = g1, year = rep(1960, length(g1))),
data.frame(gdp = g2, year = rep(1985, length(g2)))
)
pred_slice <- predict(
fhat,
newdata = nd_slice,
proper = TRUE,
proper.control = list(mode = "slice")
)
# One object that carries properization for fitted values and for later
# predict() calls.
ctrl_both <- list(
mode = "slice",
apply = "both",
slice.grid.size = 101L,
slice.extend.factor = 0.1
)
fhat_both <- npcdens(
gdp ~ year,
data = Italy2,
regtype = "lp",
degree = 3,
nmulti = 1,
proper = TRUE,
proper.control = ctrl_both
)
fit_both <- fitted(fhat_both)
pred_both <- predict(
fhat_both,
newdata = nd_slice,
proper.control = ctrl_both
)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.