npcdist  R Documentation 
npcdist
computes kernel cumulative conditional distribution
estimates on p+qvariate evaluation data, given a set of
training data (both explanatory and dependent) and a bandwidth
specification (a condbandwidth
object or a bandwidth vector,
bandwidth type, and kernel type) using the method of Li and Racine
(2008) and Li, Lin, and Racine (2013). The data may be continuous,
discrete (unordered and ordered factors), or some combination thereof.
npcdist(bws, ...) ## S3 method for class 'formula' npcdist(bws, data = NULL, newdata = NULL, ...) ## S3 method for class 'call' npcdist(bws, ...) ## S3 method for class 'condbandwidth' npcdist(bws, txdat = stop("invoked without training data 'txdat'"), tydat = stop("invoked without training data 'tydat'"), exdat, eydat, gradients = FALSE, ...) ## Default S3 method: npcdist(bws, txdat, tydat, ...)
bws 
a bandwidth specification. This can be set as a 
gradients 
a logical value specifying whether to return estimates of the
gradients at the evaluation points. Defaults to 
... 
additional arguments supplied to specify the bandwidth type, kernel
types, and so on. This is necessary if you specify bws as a
p+qvector and not a 
data 
an optional data frame, list or environment (or object coercible to
a data frame by 
newdata 
An optional data frame in which to look for evaluation data. If omitted, the training data are used. 
txdat 
a pvariate data frame of sample realizations of explanatory data (training data). Defaults to the training data used to compute the bandwidth object. 
tydat 
a qvariate data frame of sample realizations of dependent data (training data). Defaults to the training data used to compute the bandwidth object. 
exdat 
a pvariate data frame of explanatory data on
which cumulative conditional distributions will be evaluated. By
default, evaluation takes place on the data provided by

eydat 
a qvariate data frame of dependent data on which
cumulative conditional distributions will be evaluated. By default,
evaluation takes place on the data provided by 
npcdist
implements a variety of methods for estimating
multivariate conditional cumulative distributions (p+qvariate)
defined over a set of possibly continuous and/or discrete (unordered,
ordered) data. The approach is based on Li and Racine (2004) who
employ ‘generalized product kernels’ that admit a mix of
continuous and discrete data types.
Three classes of kernel estimators for the continuous data types are available: fixed, adaptive nearestneighbor, and generalized nearestneighbor. Adaptive nearestneighbor bandwidths change with each sample realization in the set, x[i], when estimating the cumulative conditional distribution at the point x. Generalized nearestneighbor bandwidths change with the point at which the cumulative conditional distribution is estimated, x. Fixed bandwidths are constant over the support of x.
Training and evaluation input data may be a
mix of continuous (default), unordered discrete (to be specified in
the data frames using factor
), and ordered discrete (to be
specified in the data frames using ordered
). Data can be
entered in an arbitrary order and data types will be detected
automatically by the routine (see np
for details).
A variety of kernels may be specified by the user. Kernels implemented for continuous data types include the second, fourth, sixth, and eighth order Gaussian and Epanechnikov kernels, and the uniform kernel. Unordered discrete data types use a variation on Aitchison and Aitken's (1976) kernel, while ordered data types use a variation of the Wang and van Ryzin (1981) kernel.
npcdist
returns a condistribution
object. The generic
accessor functions fitted
, se
, and
gradients
, extract estimated values, asymptotic standard
errors on estimates, and gradients, respectively, from
the returned object. Furthermore, the functions predict
,
summary
and plot
support objects of both classes. The returned objects
have the following components:
xbw 
bandwidth(s), scale factor(s) or nearest neighbours for the
explanatory data, 
ybw 
bandwidth(s), scale factor(s) or nearest neighbours for the
dependent data, 
xeval 
the evaluation points of the explanatory data 
yeval 
the evaluation points of the dependent data 
condist 
estimates of the conditional cumulative distribution at the evaluation points 
conderr 
standard errors of the cumulative conditional distribution estimates 
congrad 
if invoked with 
congerr 
if invoked with 
log_likelihood 
log likelihood of the cumulative conditional distribution estimate 
If you are using data of mixed types, then it is advisable to use the
data.frame
function to construct your input data and not
cbind
, since cbind
will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413420.
Hall, P. and J.S. Racine and Q. Li (2004), “Crossvalidation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 10151026.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Li, Q. and J.S. Racine (2008), “Nonparametric estimation of conditional CDF and quantile functions with mixed categorical and continuous data,” Journal of Business and Economic Statistics, 26, 423434.
Li, Q. and J. Lin and J.S. Racine (2013), “Optimal bandwidth selection for nonparametric conditional distribution and quantile functions”, Journal of Business and Economic Statistics, 31, 5765.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Scott, D.W. (1992), Multivariate Density Estimation. Theory, Practice and Visualization, New York: Wiley.
Silverman, B.W. (1986), Density Estimation, London: Chapman and Hall.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301309.
npudens
## Not run: # EXAMPLE 1 (INTERFACE=FORMULA): For this example, we load Giovanni # Baiocchi's Italian GDP panel (see Italy for details), and compute the # crossvalidated bandwidths (default) using a secondorder Gaussian # kernel (default). Note  this may take a minute or two depending on # the speed of your computer. data("Italy") attach(Italy) # First, compute the bandwidths. bw < npcdistbw(formula=gdp~ordered(year)) # Next, compute the condistribution object... Fhat < npcdist(bws=bw) # The object Fhat now contains results such as the estimated cumulative # conditional distribution function (Fhat$condist) and so on... summary(Fhat) # Call the plot() function to visualize the results (<ctrl>C will # interrupt on *NIX systems, <esc> will interrupt on MS Windows # systems). plot(bw) detach(Italy) # EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we load Giovanni # Baiocchi's Italian GDP panel (see Italy for details), and compute the # crossvalidated bandwidths (default) using a secondorder Gaussian # kernel (default). Note  this may take a minute or two depending on # the speed of your computer. data("Italy") attach(Italy) # First, compute the bandwidths. # Note  we cast `X' and `y' as data frames so that plot() can # automatically grab names (this looks like overkill, but in # multivariate settings you would do this anyway, so may as well get in # the habit). X < data.frame(year=ordered(year)) y < data.frame(gdp) bw < npcdistbw(xdat=X, ydat=y) # Next, compute the condistribution object... Fhat < npcdist(bws=bw) # The object Fhat now contains results such as the estimated cumulative # conditional distribution function (Fhat$condist) and so on... summary(Fhat) # Call the plot() function to visualize the results (<ctrl>C will # interrupt on *NIX systems, <esc> will interrupt on MS Windows systems). plot(bw) detach(Italy) # EXAMPLE 2 (INTERFACE=FORMULA): For this example, we load the old # faithful geyser data from the R `datasets' library and compute the # conditional distribution function. library("datasets") data("faithful") attach(faithful) # Note  this may take a few minutes depending on the speed of your # computer... bw < npcdistbw(formula=eruptions~waiting) summary(bw) # Plot the conditional cumulative distribution function (<ctrl>C will # interrupt on *NIX systems, <esc> will interrupt on MS Windows # systems). plot(bw) detach(faithful) # EXAMPLE 2 (INTERFACE=DATA FRAME): For this example, we load the old # faithful geyser data from the R `datasets' library and compute the # cumulative conditional distribution function. library("datasets") data("faithful") attach(faithful) # Note  this may take a few minutes depending on the speed of your # computer... # Note  we cast `X' and `y' as data frames so that plot() can # automatically grab names (this looks like overkill, but in # multivariate settings you would do this anyway, so may as well get in # the habit). X < data.frame(waiting) y < data.frame(eruptions) bw < npcdistbw(xdat=X, ydat=y) summary(bw) # Plot the conditional cumulative distribution function (<ctrl>C will # interrupt on *NIX systems, <esc> will interrupt on MS Windows systems) plot(bw) detach(faithful) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.