npplreg  R Documentation 
npplreg
computes a partially linear kernel regression estimate
of a one (1) dimensional dependent variable on p+q
variate
explanatory data, using the model Y = X\beta + \Theta (Z) +
\epsilon
given a set of estimation
points, training points (consisting of explanatory data and dependent
data), and a bandwidth specification, which can be a rbandwidth
object, or a bandwidth vector, bandwidth type and kernel type.
npplreg(bws, ...)
## S3 method for class 'formula'
npplreg(bws, data = NULL, newdata = NULL, y.eval =
FALSE, ...)
## S3 method for class 'call'
npplreg(bws, ...)
## S3 method for class 'plbandwidth'
npplreg(bws,
txdat = stop("training data txdat missing"),
tydat = stop("training data tydat missing"),
tzdat = stop("training data tzdat missing"),
exdat,
eydat,
ezdat,
residuals = FALSE,
...)
bws 
a bandwidth specification. This can be set as a 
... 
additional arguments supplied to specify the regression type,
bandwidth type, kernel types, selection methods, and so on. To do
this, you may specify any of 
data 
an optional data frame, list or environment (or object
coercible to a data frame by 
newdata 
An optional data frame in which to look for evaluation data. If omitted, the training data are used. 
y.eval 
If 
txdat 
a 
tydat 
a one (1) dimensional numeric or integer vector of dependent data, each
element 
tzdat 
a 
exdat 
a 
eydat 
a one (1) dimensional numeric or integer vector of the true values
of the dependent variable. Optional, and used only to calculate the
true errors. By default,
evaluation takes place on the data provided by 
ezdat 
a 
residuals 
a logical value indicating that you want residuals computed and
returned in the resulting 
npplreg
uses a combination of OLS and nonparametric
regression to estimate the parameter \beta
in the model
Y = X\beta + \Theta (Z) + \epsilon
.
npplreg
implements a variety of methods for
nonparametric regression on multivariate (q
variate) explanatory
data defined over a set of possibly continuous and/or discrete
(unordered, ordered) data. The approach is based on Li and Racine
(2003) who employ ‘generalized product kernels’ that admit a mix
of continuous and discrete data types.
Three classes of kernel estimators for the continuous data types are
available: fixed, adaptive nearestneighbor, and generalized
nearestneighbor. Adaptive nearestneighbor bandwidths change with
each sample realization in the set, x_i
, when estimating the
density at the point x
. Generalized nearestneighbor bandwidths change
with the point at which the density is estimated, x
. Fixed bandwidths
are constant over the support of x
.
Data contained in the data frame tzdat
may be a mix of
continuous (default), unordered discrete (to be specified in the data
frame tzdat
using factor
), and ordered discrete
(to be specified in the data frame tzdat
using
ordered
). Data can be entered in an arbitrary order and
data types will be detected automatically by the routine (see
np
for details).
A variety of kernels may be specified by the user. Kernels implemented for continuous data types include the second, fourth, sixth, and eighth order Gaussian and Epanechnikov kernels, and the uniform kernel. Unordered discrete data types use a variation on Aitchison and Aitken's (1976) kernel, while ordered data types use a variation of the Wang and van Ryzin (1981) kernel.
npplreg
returns a plregression
object. The generic
accessor functions coef
, fitted
,
residuals
, predict
, and
vcov
, extract (or
estimate) coefficients, estimated values, residuals,
predictions, and variancecovariance matrices,
respectively, from
the returned object. Furthermore, the functions summary
and plot
support objects of this type. The returned object
has the following components:
evalx 
evaluation points 
evalz 
evaluation points 
mean 
estimation of the regression, or conditional mean, at the evaluation points 
xcoef 
coefficient(s) corresponding to the components

xcoeferr 
standard errors of the coefficients 
xcoefvcov 
covariance matrix of the coefficients 
bw 
the bandwidths, stored as a 
resid 
if 
R2 
coefficient of determination (Doksum and Samarov (1995)) 
MSE 
mean squared error 
MAE 
mean absolute error 
MAPE 
mean absolute percentage error 
CORR 
absolute value of Pearson's correlation coefficient 
SIGN 
fraction of observations where fitted and observed values agree in sign 
If you are using data of mixed types, then it is advisable to use the
data.frame
function to construct your input data and not
cbind
, since cbind
will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413420.
Doksum, K. and A. Samarov (1995), “Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression,” The Annals of Statistics, 23 14431473.
Gao, Q. and L. Liu and J.S. Racine (2015), “A partially linear kernel estimator for categorical data,” Econometric Reviews, 34 (610), 958977.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Li, Q. and J.S. Racine (2004), “Crossvalidated local linear nonparametric regression,” Statistica Sinica, 14, 485512.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Racine, J.S. and Q. Li (2004), “Nonparametric estimation of regression functions with both categorical and continuous data,” Journal of Econometrics, 119, 99130.
Robinson, P.M. (1988), “Rootnconsistent semiparametric regression,” Econometrica, 56, 931954.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301309.
npregbw
, npreg
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we simulate an
# example for a partially linear model and compare the coefficient
# estimates from the partially linear model with those from a correctly
# specified parametric model...
set.seed(42)
n < 250
x1 < rnorm(n)
x2 < rbinom(n, 1, .5)
z1 < rbinom(n, 1, .5)
z2 < rnorm(n)
y < 1 + x1 + x2 + z1 + sin(z2) + rnorm(n)
# First, compute datadriven bandwidths. This may take a few minutes
# depending on the speed of your computer...
bw < npplregbw(formula=y~x1+factor(x2)factor(z1)+z2)
# Next, compute the partially linear fit
pl < npplreg(bws=bw)
# Print a summary of the model...
summary(pl)
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Retrieve the coefficient estimates and their standard errors...
coef(pl)
coef(pl, errors = TRUE)
# Compare the partially linear results to those from a correctly
# specified model's coefficients for x1 and x2
ols < lm(y~x1+factor(x2)+factor(z1)+I(sin(z2)))
# The intercept is coef()[1], and those for x1 and x2 are coef()[2] and
# coef()[3]. The standard errors are the square root of the diagonal of
# the variancecovariance matrix (elements 2 and 3)
coef(ols)[2:3]
sqrt(diag(vcov(ols)))[2:3]
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Plot the regression surfaces via plot() (i.e., plot the `partial
# regression surface plots').
plot(bw)
# Note  to plot regression surfaces with variability bounds constructed
# from bootstrapped standard errors, use the following (note that this
# may take a minute or two depending on the speed of your computer as
# the bootstrapping is done in real time, and note also that we override
# the default number of bootstrap replications (399) reducing them to 25
# in order to quickly compute standard errors in this instance  don't
# of course do this in general)
plot(bw,
plot.errors.boot.num=25,
plot.errors.method="bootstrap")
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we simulate an
# example for a partially linear model and compare the coefficient
# estimates from the partially linear model with those from a correctly
# specified parametric model...
set.seed(42)
n < 250
x1 < rnorm(n)
x2 < rbinom(n, 1, .5)
z1 < rbinom(n, 1, .5)
z2 < rnorm(n)
y < 1 + x1 + x2 + z1 + sin(z2) + rnorm(n)
X < data.frame(x1, factor(x2))
Z < data.frame(factor(z1), z2)
# First, compute datadriven bandwidths. This may take a few minutes
# depending on the speed of your computer...
bw < npplregbw(xdat=X, zdat=Z, ydat=y)
# Next, compute the partially linear fit
pl < npplreg(bws=bw)
# Print a summary of the model...
summary(pl)
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Retrieve the coefficient estimates and their standard errors...
coef(pl)
coef(pl, errors = TRUE)
# Compare the partially linear results to those from a correctly
# specified model's coefficients for x1 and x2
ols < lm(y~x1+factor(x2)+factor(z1)+I(sin(z2)))
# The intercept is coef()[1], and those for x1 and x2 are coef()[2] and
# coef()[3]. The standard errors are the square root of the diagonal of
# the variancecovariance matrix (elements 2 and 3)
coef(ols)[2:3]
sqrt(diag(vcov(ols)))[2:3]
# Sleep for 5 seconds so that we can examine the output...
Sys.sleep(5)
# Plot the regression surfaces via plot() (i.e., plot the `partial
# regression surface plots').
plot(bw)
# Note  to plot regression surfaces with variability bounds constructed
# from bootstrapped standard errors, use the following (note that this
# may take a minute or two depending on the speed of your computer as
# the bootstrapping is done in real time, and note also that we override
# the default number of bootstrap replications (399) reducing them to 25
# in order to quickly compute standard errors in this instance  don't
# of course do this in general)
plot(bw,
plot.errors.boot.num=25,
plot.errors.method="bootstrap")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.