npudist | R Documentation |

`npudist`

computes kernel unconditional cumulative distribution
estimates on evaluation data, given a set of training data and a
bandwidth specification (a `dbandwidth`

object or a bandwidth
vector, bandwidth type, and kernel type) using the method of Li, Li
and Racine (2017).

```
npudist(bws, ...)
## S3 method for class 'formula'
npudist(bws, data = NULL, newdata = NULL, ...)
## S3 method for class 'dbandwidth'
npudist(bws,
tdat = stop("invoked without training data 'tdat'"),
edat,
...)
## S3 method for class 'call'
npudist(bws, ...)
## Default S3 method:
npudist(bws, tdat, ...)
```

`bws` |
a |

`...` |
additional arguments supplied to specify the training data, the
bandwidth type, kernel types, and so on. This is necessary if you
specify bws as a |

`tdat` |
a |

`edat` |
a |

`data` |
an optional data frame, list or environment (or object
coercible to a data frame by |

`newdata` |
An optional data frame in which to look for evaluation data. If omitted, the training data are used. |

Typical usages are (see below for a complete list of options and also the examples at the end of this help file)

Usage 1: first compute the bandwidth object via npudistbw and then compute the cumulative distribution: bw <- npudistbw(~y) Fhat <- npudist(bw) Usage 2: alternatively, compute the bandwidth object indirectly: Fhat <- npudist(~y) Usage 3: modify the default kernel and order: Fhat <- npudist(~y, ckertype="epanechnikov", ckerorder=4) Usage 4: use the data frame interface rather than the formula interface: Fhat <- npudist(tdat = y, ckertype="epanechnikov", ckerorder=4)

`npudist`

implements a variety of methods for estimating
multivariate cumulative distributions (`p`

-variate) defined over a
set of possibly continuous and/or discrete (ordered) data. The
approach is based on Li and Racine (2003) who employ
‘generalized product kernels’ that admit a mix of continuous
and discrete data types.

Three classes of kernel estimators for the continuous data types are
available: fixed, adaptive nearest-neighbor, and generalized
nearest-neighbor. Adaptive nearest-neighbor bandwidths change with
each sample realization in the set, `x_i`

, when estimating
the cumulative distribution at the point `x`

. Generalized nearest-neighbor
bandwidths change with the point at which the cumulative distribution is estimated,
`x`

. Fixed bandwidths are constant over the support of `x`

.

Data contained in the data frame `tdat`

(and also `edat`

)
may be a mix of continuous (default) and ordered discrete (to be
specified in the data frame `tdat`

using the
`ordered`

command). Data can be entered in an arbitrary
order and data types will be detected automatically by the routine
(see `np`

for details).

A variety of kernels may be specified by the user. Kernels implemented for continuous data types include the second, fourth, sixth, and eighth-order Gaussian and Epanechnikov kernels, and the uniform kernel. Ordered data types use a variation of the Wang and van Ryzin (1981) kernel.

`npudist`

returns a `npdistribution`

object. The
generic accessor functions `fitted`

and `se`

extract estimated values and asymptotic standard errors on estimates,
respectively, from the returned object. Furthermore, the functions
`predict`

, `summary`

and `plot`

support objects of both classes. The returned objects have the
following components:

`eval` |
the evaluation points. |

`dist` |
estimate of the cumulative distribution at the evaluation points |

`derr` |
standard errors of the cumulative distribution estimates |

If you are using data of mixed types, then it is advisable to use the
`data.frame`

function to construct your input data and not
`cbind`

, since `cbind`

will typically not work as
intended on mixed data types and will coerce the data to the same
type.

Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca

Aitchison, J. and C.G.G. Aitken (1976), “ Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.

Li, Q. and J.S. Racine (2007), *Nonparametric Econometrics: Theory
and Practice,* Princeton University Press.

Li, Q. and J.S. Racine (2003), “Nonparametric estimation of distributions with categorical and continuous data,” Journal of Multivariate Analysis, 86, 266-292.

Li, C. and H. Li and J.S. Racine (2017), “Cross-Validated Mixed
Datatype Bandwidth Selection for Nonparametric Cumulative
Distribution/Survivor Functions,” Econometric Reviews, **36**,
970-987.

Ouyang, D. and Q. Li and J.S. Racine (2006), “Cross-validation and the estimation of probability distributions with categorical data,” Journal of Nonparametric Statistics, 18, 69-100.

Pagan, A. and A. Ullah (1999), *Nonparametric Econometrics,* Cambridge
University Press.

Scott, D.W. (1992), *Multivariate Density Estimation. Theory,
Practice and Visualization,* New York: Wiley.

Silverman, B.W. (1986), *Density Estimation,* London: Chapman and
Hall.

Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.

`npudistbw`

, `density`

```
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we load Giovanni
# Baiocchi's Italian GDP panel (see Italy for details), then create a
# data frame in which year is an ordered factor, GDP is continuous,
# compute bandwidths using cross-validation, then create a grid of data
# on which the cumulative distribution will be evaluated for plotting
# purposes.
data("Italy")
attach(Italy)
# Compute bandwidths using cross-validation (default).
bw <- npudistbw(formula=~ordered(year)+gdp)
# At this stage you could use npudist() to do a variety of things. Here
# we compute the npudist() object and place it in Fhat.
Fhat <- npudist(bws=bw)
# Note that simply typing the name of the object returns some useful
# information. For more info, one can call summary:
summary(Fhat)
# Next, we illustrate how to create a grid of `evaluation data' and feed
# it to the perspective plotting routines in R, among others.
# Create an evaluation data matrix
year.seq <- sort(unique(year))
gdp.seq <- seq(1,36,length=50)
data.eval <- expand.grid(year=year.seq,gdp=gdp.seq)
# Generate the estimated cumulative distribution computed for the
# evaluation data
Fhat <- fitted(npudist(bws=bw, newdata=data.eval))
# Coerce the data into a matrix for plotting with persp()
F <- matrix(Fhat, length(unique(year)), 50)
# Next, create a 3D perspective plot of the CDF F, and a 2D
# contour plot.
persp(as.integer(levels(year.seq)), gdp.seq, F, col="lightblue",
ticktype="detailed", ylab="GDP", xlab="Year", zlab="Density",
theta=300, phi=50)
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
contour(as.integer(levels(year.seq)),
gdp.seq,
F,
xlab="Year",
ylab="GDP",
main = "Cumulative Distribution Contour Plot",
col=topo.colors(100))
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Alternatively, you could use the plot() command (<ctrl>-C will
# interrupt on *NIX systems, <esc> will interrupt on MS Windows
# systems).
plot(bw)
detach(Italy)
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we load Giovanni
# Baiocchi's Italian GDP panel (see Italy for details), then create a
# data frame in which year is an ordered factor, GDP is continuous,
# compute bandwidths using cross-validation, then create a grid of data
# on which the cumulative distribution will be evaluated for plotting
# purposes.
data("Italy")
attach(Italy)
data <- data.frame(year=ordered(year), gdp)
# Compute bandwidths using cross-validation (default).
bw <- npudistbw(dat=data)
# At this stage you could use npudist() to do a variety of
# things. Here we compute the npudist() object and place it in Fhat.
Fhat <- npudist(bws=bw)
# Note that simply typing the name of the object returns some useful
# information. For more info, one can call summary:
summary(Fhat)
# Next, we illustrate how to create a grid of `evaluation data' and feed
# it to the perspective plotting routines in R, among others.
# Create an evaluation data matrix
year.seq <- sort(unique(year))
gdp.seq <- seq(1,36,length=50)
data.eval <- expand.grid(year=year.seq,gdp=gdp.seq)
# Generate the estimated cumulative distribution computed for the
# evaluation data
Fhat <- fitted(npudist(edat = data.eval, bws=bw))
# Coerce the data into a matrix for plotting with persp()
F <- matrix(Fhat, length(unique(year)), 50)
# Next, create a 3D perspective plot of the CDF F, and a 2D
# contour plot.
persp(as.integer(levels(year.seq)), gdp.seq, F, col="lightblue",
ticktype="detailed", ylab="GDP", xlab="Year",
zlab="Cumulative Distribution",
theta=300, phi=50)
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
contour(as.integer(levels(year.seq)),
gdp.seq,
F,
xlab="Year",
ylab="GDP",
main = "Cumulative Distribution Contour Plot",
col=topo.colors(100))
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Alternatively, you could use the plot() command (<ctrl>-C will
# interrupt on *NIX systems, <esc> will interrupt on MS Windows
# systems).
plot(bw)
detach(Italy)
# EXAMPLE 2 (INTERFACE=FORMULA): For this example, we load the old
# faithful geyser data and compute the cumulative distribution function.
library("datasets")
data("faithful")
attach(faithful)
# Note - this may take a few minutes depending on the speed of your
# computer...
bw <- npudistbw(formula=~eruptions+waiting)
summary(bw)
# Plot the cumulative distribution function (<ctrl>-C will interrupt on
# *NIX systems, <esc> will interrupt on MS Windows systems). Note that
# we use xtrim = -0.2 to extend the plot outside the support of the data
# (i.e., extend the tails of the estimate to meet the horizontal axis).
plot(bw, xtrim=-0.2)
detach(faithful)
# EXAMPLE 2 (INTERFACE=DATA FRAME): For this example, we load the old
# faithful geyser data and compute the cumulative distribution function.
library("datasets")
data("faithful")
attach(faithful)
# Note - this may take a few minutes depending on the speed of your
# computer...
bw <- npudistbw(dat=faithful)
summary(bw)
# Plot the cumulative distribution function (<ctrl>-C will interrupt on
# *NIX systems, <esc> will interrupt on MS Windows systems). Note that
# we use xtrim = -0.2 to extend the plot outside the support of the data
# (i.e., extend the tails of the estimate to meet the horizontal axis).
plot(bw, xtrim=-0.2)
detach(faithful)
## End(Not run)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.