| plot.np | R Documentation |
Plotting is provided via plot S3 methods, which generate
plots of
nonparametric statistical objects such as regressions, quantile
regressions, partially linear regressions, single-index models,
densities and distributions, given training data and a bandwidth
object. plot(...) is the supported public interface.
## S3 method for class 'bandwidth'
plot(x, ...)
## S3 method for class 'conbandwidth'
plot(x, ...)
## S3 method for class 'plbandwidth'
plot(x, ...)
## S3 method for class 'rbandwidth'
plot(x, ...)
## S3 method for class 'scbandwidth'
plot(x, ...)
## S3 method for class 'sibandwidth'
plot(x, ...)
This argument identifies the object to plot.
x |
a bandwidth specification. This should be a bandwidth object
returned from an invocation of |
Further graphical controls are passed through ... to the relevant plot method.
... |
additional arguments supplied to control plotting behavior or passed
through to underlying plotting helpers where supported. Named options
passed via
|
Documentation guide: see np.kernels for kernels and
np.options for global options.
The preferred public interface is plot on fitted or
bandwidth objects (e.g., plot(fit) or plot(bw)).
plot is a general purpose plotting routine for visually
exploring objects generated by the np library, such as
regressions, quantile regressions, partially linear regressions,
single-index models, densities and distributions. There is no need to
call plot directly: plotting is handled by class-specific S3
plot methods for objects generated by the np
package.
Visualizing one and two dimensional datasets is a straightforward
process. The default behavior of plot is to generate a
standard 2D plot to visualize univariate data, and a perspective plot
for bivariate data. When visualizing higher dimensional data,
plot resorts to plotting a series of 1D slices of the
data. For a slice along dimension i, all other variables at
indices j \ne i are held constant at the quantiles
specified in the jth element of xq. The default is the
median.
The slice itself is evaluated on a uniformly spaced sequence of
neval points. The interval of evaluation is determined by the
training data. The default behavior is to evaluate from
min(txdat[,i]) to max(txdat[,i]). The xtrim
variable allows for control over this behavior. When xtrim is
set, data is evaluated from the xtrim[i]th quantile of
txdat[,i] to the 1.0-xtrim[i]th quantile of
txdat[,i].
Furthermore, xtrim can be set to a negative
value in which case it will expand the limits of the evaluation
interval beyond the support of the training data, by measuring the
distance between min(txdat[,i]) and the xtrim[i]th
quantile of txdat[,i], and extending the support by that
distance on the lower limit of the interval. plot uses an
analogous procedure to extend the upper limit of the interval.
Plot interval/error types are:
"pmzsd" : point estimate +/- z_(1-alpha/2) * standard error "pointwise" : per-point two-sided interval from N(0,1) quantiles "bonferroni" : pointwise interval with alpha replaced by alpha/m "simultaneous" : joint-band interval (bootstrap route)
where m is the number of evaluation points used in the plotted
curve/surface (m=\texttt{neval} for univariate curves,
typically m=\texttt{neval}^2 for full 2D perspective
surfaces).
For asymptotic intervals, let T(x) denote the plotted functional
(mean, gradient, density, distribution, etc.) and \widehat{se}(x)
its asymptotic standard error:
T(x)\pm z_{1-\alpha/2}\widehat{se}(x) for "pmzsd" and
[T(x)+z_{\alpha/2}\widehat{se}(x),\ T(x)+z_{1-\alpha/2}\widehat{se}(x)]
for "pointwise".
"bonferroni" applies the same pointwise construction with
\alpha/m in place of \alpha. For the kernel estimators in
this package, asymptotic simultaneous bands are not generally
available, so "simultaneous" with
plot.errors.method="asymptotic" returns NA bands.
Asymptotic standard errors are taken from fitted-object components such as
merr, gerr, derr, conderr, and
congerr where implemented.
Bootstrap resampling is conducted pairwise on (y,X,Z) (i.e., by
resampling rows of (y,X) or (y,X,Z) as appropriate).
Bootstrap method support differs by estimator family:
Regression-family (npreg/npindex/npscoef/npplreg): wild, inid, fixed, geom Density/distribution-family (npudens/npudist/npcdens/npcdist): inid, fixed, geom
hence "wild" is only available for regression-family plotting.
Implementation notes for speed:
wild : fast np*hat linear-operator bootstrap path inid/fixed/geom : fast direct helper path (no internal bandwidth search)
For non-fixed density/distribution bootstrap, an explicit experimental
approximation is available via
plot.errors.boot.nonfixed=c("exact","frozen"). The default
"exact" route recomputes the non-fixed geometry for each resample.
The experimental "frozen" route reuses the original-sample
non-fixed geometry throughout the bootstrap run. This option is currently
implemented only for unconditional and conditional density/distribution
bootstrap routes and remains off by default. For generalized/adaptive
nearest-neighbor runs, "frozen" is an approximation that can alter
interval/band width by holding the original-sample nearest-neighbor
geometry fixed across bootstrap resamples; "exact" remains the
recommended setting for production inference. This approximation can be
more noticeable for conditional density/distribution plotting than for the
regression-style plot families because the conditional bootstrap paths
freeze both numerator and denominator nearest-neighbor geometry before
recombining them. In practice, conditional distribution bands are often
closer, while conditional density bands can differ more materially from
"exact" under generalized/adaptive nearest-neighbor bandwidths.
For smooth coefficient plots (npscoef) under non-fixed bandwidths,
"exact" can also be much more expensive than "frozen" on
large jobs, because the coefficient field must be recomputed for each
bootstrap resample rather than reusing the original-sample geometry. This
recomputation cannot in general be avoided without a more aggressive
approximation: for npscoef the local weighted systems that define the
coefficient vector depend on the bootstrap resample weights/counts at each
evaluation point, so unlike npplreg there is no single global
coefficient vector that can be updated once per draw.
inid admits general heteroskedasticity of unknown form, though
it does not allow for dependence. fixed conducts Kunsch's (1988)
block bootstrap for dependent data, while geom conducts Politis
and Romano's (1994) stationary bootstrap.
For local polynomial conditional density/distribution plotting
(npcdens/npcdist with regtype="ll" or
regtype="lp") and proper=TRUE, the plotted estimate is
rendered proper slice-by-slice on the fixed evaluation grid: each
conditional density slice is projected to be nonnegative and to integrate
to one using trapezoidal quadrature weights from the evaluation
y-grid, while each conditional distribution slice is projected to
be monotone and bounded in [0,1]. When
plot.errors.method="bootstrap", the bootstrap resample surfaces
are computed first on that same fixed grid and then properized
resample-by-resample using the same grid geometry before
"pointwise", "bonferroni", "simultaneous", and
"all" bands are constructed. Thus the bootstrap distribution used
to form these bands is built from properized resample surfaces. The final
lower/upper band surfaces are interval envelopes and are not themselves
separately re-projected to satisfy the density/distribution shape
constraints.
For consistency of the block and stationary bootstrap, the (mean)
block length b should grow with the sample size n at an
appropriate rate. If b is not given, then a default growth rate
of const \times n^{1/3} is used. This rate is
“optimal” under certain conditions (see Politis and Romano
(1994) for more details). However, in general the growth rate depends on
the specific properties of the DGP. A default value for const
(3.15) has been determined by a Monte Carlo simulation using a
Gaussian AR(1) process (AR(1)-parameter of 0.5, 500
observations). const has been chosen such that the mean square
error for the bootstrap estimate of the variance of the empirical mean
is minimized.
The default bootstrap replication count is
plot.errors.boot.num=1999. For pointwise tails, ensure
B \ge \lceil 2/\alpha - 1 \rceil so
\alpha(B+1) is feasible on the bootstrap rank grid. For interval
types "bonferroni", "simultaneous", and "all",
the minimum recommended count is
B_{\min}=\lceil 2m/\alpha-1 \rceil,
where m is the number of evaluation points used by the plotted
curve/surface. For full 2D perspective grids this is typically
m=\texttt{neval}^2. When B is below these
thresholds, plotting proceeds but warning guidance is reported.
Typical plotting calls:
## Asymptotic pointwise/bonferroni intervals plot(bw, plot.errors.method="asymptotic", plot.errors.type="pointwise") plot(bw, plot.errors.method="asymptotic", plot.errors.type="bonferroni") ## Regression-family bootstrap (wild available) plot(bw, plot.errors.method="bootstrap", plot.errors.boot.method="wild") ## Density/distribution-family bootstrap (use inid/fixed/geom) plot(bw, plot.errors.method="bootstrap", plot.errors.boot.method="inid")
Setting plot.behavior will instruct plot what data
to return. Option summary:
plot: instruct plot to just plot the data and
return NULL
plot-data: instruct plot to plot the data and return
the data used to generate the plots. The data will be a list of
objects of the appropriate type, with one object per plot. For
example, invoking plot on 3D density data will have it
return a list of three npdensity objects. If biases were calculated,
they are stored in a component named bias
data: instruct plot to generate data only and no plots
If you are using data of mixed types, then it is advisable to use the
data.frame function to construct your input data and not
cbind, since cbind will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 1015-1026.
Kunsch, H.R. (1989), “The jackknife and the bootstrap for general stationary observations,” The Annals of Statistics, 17, 1217-1241.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Politis, D.N. and J.P. Romano (1994), “The stationary bootstrap,” Journal of the American Statistical Association, 89, 1303-1313.
Scott, D.W. (1992), Multivariate Density Estimation. Theory, Practice and Visualization, New York: Wiley.
Silverman, B.W. (1986), Density Estimation, London: Chapman and Hall.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
np.kernels, np.options
np.options
## Not run:
# EXAMPLE 1: For this example, we load Giovanni Baiocchi's Italian GDP
# panel (see Italy for details), then create a data frame in which year
# is an ordered factor, GDP is continuous, compute bandwidths using
# likelihood cross-validation, then create a grid of data on which the
# density will be evaluated for plotting purposes
data("Italy")
attach(Italy)
data <- data.frame(ordered(year), gdp)
# Compute bandwidths using likelihood cross-validation (default). Note
# that this may take a minute or two depending on the speed of your
# computer...
bw <- npudensbw(dat=data)
# You can always do things manually, as the following example demonstrates
# Create an evaluation data matrix
year.seq <- sort(unique(year))
gdp.seq <- seq(1,36,length=50)
data.eval <- expand.grid(year=year.seq,gdp=gdp.seq)
# Generate the estimated density computed for the evaluation data
fhat <- fitted(npudens(tdat = data, edat = data.eval, bws=bw))
# Coerce the data into a matrix for plotting with persp()
f <- matrix(fhat, length(unique(year)), 50)
# Next, create a 3D perspective plot of the PDF f
persp(as.integer(levels(year.seq)), gdp.seq, f, col="lightblue",
ticktype="detailed", ylab="GDP", xlab="Year", zlab="Density",
theta=300, phi=50)
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# However, plot simply streamlines this process and aids in the
# visualization process (<ctrl>-C will interrupt on *NIX systems, <esc>
# will interrupt on MS Windows systems).
plot(bw)
# plot also streamlines construction of variability bounds (<ctrl>-C
# will interrupt on *NIX systems, <esc> will interrupt on MS Windows
# systems)
plot(bw, plot.errors.method = "asymptotic")
# EXAMPLE 2: For this example, we simulate multivariate data, and plot the
# partial regression surfaces for a locally linear estimator and its
# derivatives.
set.seed(123)
n <- 100
x1 <- runif(n)
x2 <- runif(n)
x3 <- runif(n)
x4 <- rbinom(n, 2, .3)
y <- 1 + x1 + x2 + x3 + x4 + rnorm(n)
X <- data.frame(x1, x2, x3, ordered(x4))
bw <- npregbw(xdat=X, ydat=y, regtype="ll", bwmethod="cv.aic")
plot(bw)
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Now plot the gradients...
plot(bw, gradients=TRUE)
# Plot the partial regression surfaces with bias-corrected bootstrapped
# nonparametric confidence intervals... this may take a minute or two
# depending on the speed of your computer as the bootstrapping must be
# completed prior to results being displayed...
plot(bw,
plot.errors.method="bootstrap",
plot.errors.center="bias-corrected",
plot.errors.type="simultaneous")
# EXAMPLE 3: This example demonstrates how to retrieve plotting data from
# plot(). When plot() is called with the arguments
# `plot.behavior="plot-data"' (or "data"), it returns plotting objects
# named r1, r2, and so on (rg1, rg2, and so on when `gradients=TRUE' is
# set). Each plotting object's index (1,2,...) corresponds to the index
# of the explanatory data data frame xdat (and zdat if appropriate).
# Take the cps71 data by way of example. In this case, there is only one
# object returned by default, `r1', since xdat is univariate.
data("cps71", package = "np")
# Compute bandwidths for local linear regression using cv.aic...
bw <- npregbw(xdat=cps71$age, ydat=cps71$logwage,
regtype="ll", bwmethod="cv.aic")
# Generate the plot and return plotting data, and store output in
# `plot.out' (NOTE: the call to `plot.behavior' is necessary).
plot.out <- plot(bw,
perspective=FALSE,
plot.errors.method="bootstrap",
plot.errors.boot.num=25,
plot.behavior="plot-data")
# Now grab the r1 object that plot plotted on the screen, and take
# what you need. First, take the output, lower error bound and upper
# error bound...
logwage.eval <- fitted(plot.out$r1)
logwage.se <- se(plot.out$r1)
logwage.lower.ci <- logwage.eval + logwage.se[,1]
logwage.upper.ci <- logwage.eval + logwage.se[,2]
# Next grab the x data evaluation data. xdat is a data.frame(), so we
# need to coerce it into a vector (take the `first column' of data frame
# even though there is only one column)
age.eval <- plot.out$r1$eval[,1]
# Now we could plot this if we wished, or direct it to whatever end use
# we envisioned. We plot the results using R's plot() routines...
with(cps71, plot(age, logwage, cex=0.2, xlab="Age", ylab="log(Wage)"))
lines(age.eval,logwage.eval)
lines(age.eval,logwage.lower.ci,lty=3)
lines(age.eval,logwage.upper.ci,lty=3)
# If you wanted plot() data for gradients, you would use the argument
# `gradients=TRUE' in the call to plot() as the following
# demonstrates...
plot.out <- plot(bw,
perspective=FALSE,
plot.errors.method="bootstrap",
plot.errors.boot.num=25,
plot.behavior="plot-data",
gradients=TRUE)
# Now grab object that plot() plotted on the screen. First, take the
# output, lower error bound and upper error bound... note that gradients
# are stored in objects rg1, rg2 etc.
grad.eval <- gradients(plot.out$rg1)
grad.se <- gradients(plot.out$rg1, errors = TRUE)
grad.lower.ci <- grad.eval + grad.se[,1]
grad.upper.ci <- grad.eval + grad.se[,2]
# Next grab the x evaluation data. xdat is a data.frame(), so we need to
# coerce it into a vector (take `first column' of data frame even though
# there is only one column)
age.eval <- plot.out$rg1$eval[,1]
# We plot the results using R's plot() routines...
plot(age.eval,grad.eval,cex=0.2,
ylim=c(min(grad.lower.ci),max(grad.upper.ci)),
xlab="Age",ylab="d log(Wage)/d Age",type="l")
lines(age.eval,grad.lower.ci,lty=3)
lines(age.eval,grad.upper.ci,lty=3)
# EXAMPLE 4: Variations on local polynomial conditional density
# estimation with proper = TRUE.
data("Italy")
Italy2 <- within(Italy, {
year <- as.numeric(as.character(year))
})
# Plot only: make the plotted surface proper on the plot evaluation grid.
fhat <- npcdens(gdp ~ year, data = Italy2,
regtype = "lp", degree = 3, nmulti = 1)
plot(fhat, proper = TRUE)
# Fit an object whose fitted values are themselves proper.
ctrl_fit <- list(
mode = "slice",
apply = "fitted",
slice.grid.size = 101L,
slice.extend.factor = 0.1
)
fhat_fit <- npcdens(
gdp ~ year,
data = Italy2,
regtype = "lp",
degree = 3,
nmulti = 1,
proper = TRUE,
proper.control = ctrl_fit
)
fit_proper <- fitted(fhat_fit)
fit_raw <- fhat_fit$condens.raw
# Display the repaired and raw fitted values for cases where the raw
# fitted density is negative.
head(cbind(fit_proper, fit_raw)[which(fit_raw < 0), ])
# Predict on a common explicit y-grid for several years, and render
# those predictions proper.
g.grid <- seq(min(Italy2$gdp), max(Italy2$gdp), length.out = 200)
nd_grid <- expand.grid(
gdp = g.grid,
year = c(1955, 1975, 1995)
)
pred_grid <- predict(fhat, newdata = nd_grid, proper = TRUE)
# Predict on paired rows with different gdp grids by year, and still
# make the predictions proper via slice mode.
g1 <- seq(quantile(Italy2$gdp, 0.10),
quantile(Italy2$gdp, 0.60), length.out = 60)
g2 <- seq(quantile(Italy2$gdp, 0.30),
quantile(Italy2$gdp, 0.90), length.out = 35)
nd_slice <- rbind(
data.frame(gdp = g1, year = rep(1960, length(g1))),
data.frame(gdp = g2, year = rep(1985, length(g2)))
)
pred_slice <- predict(
fhat,
newdata = nd_slice,
proper = TRUE,
proper.control = list(mode = "slice")
)
# One object that carries properization for fitted values and for later
# predict() calls.
ctrl_both <- list(
mode = "slice",
apply = "both",
slice.grid.size = 101L,
slice.extend.factor = 0.1
)
fhat_both <- npcdens(
gdp ~ year,
data = Italy2,
regtype = "lp",
degree = 3,
nmulti = 1,
proper = TRUE,
proper.control = ctrl_both
)
fit_both <- fitted(fhat_both)
pred_both <- predict(
fhat_both,
newdata = nd_slice,
proper.control = ctrl_both
)
plot(fhat_both)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.