Generalized Local Polynomial Regression
Description
npglpreg
computes a generalized local polynomial kernel
regression estimate (Hall and Racine (forthcoming)) of a one (1)
dimensional dependent variable on an r
dimensional vector of
continuous and categorical
(factor
/ordered
) predictors.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68  npglpreg(...)
## Default S3 method:
npglpreg(tydat = NULL,
txdat = NULL,
eydat = NULL,
exdat = NULL,
bws = NULL,
degree = NULL,
leave.one.out = FALSE,
ckertype = c("gaussian", "epanechnikov", "uniform", "truncated gaussian"),
ckerorder = 2,
ukertype = c("liracine", "aitchisonaitken"),
okertype = c("liracine", "wangvanryzin"),
bwtype = c("fixed", "generalized_nn", "adaptive_nn", "auto"),
gradient.vec = NULL,
gradient.categorical = FALSE,
cv.shrink = TRUE,
cv.maxPenalty = sqrt(.Machine$double.xmax),
cv.warning = FALSE,
Bernstein = TRUE,
mpi = FALSE,
...)
## S3 method for class 'formula'
npglpreg(formula,
data = list(),
tydat = NULL,
txdat = NULL,
eydat = NULL,
exdat = NULL,
bws = NULL,
degree = NULL,
leave.one.out = FALSE,
ckertype = c("gaussian", "epanechnikov","uniform","truncated gaussian"),
ckerorder = 2,
ukertype = c("liracine", "aitchisonaitken"),
okertype = c("liracine", "wangvanryzin"),
bwtype = c("fixed", "generalized_nn", "adaptive_nn", "auto"),
cv = c("degreebandwidth", "bandwidth", "none"),
cv.func = c("cv.ls", "cv.aic"),
nmulti = 5,
random.seed = 42,
degree.max = 10,
degree.min = 0,
bandwidth.max = .Machine$double.xmax,
bandwidth.min = sqrt(.Machine$double.eps),
bandwidth.min.numeric = 1.0e02,
bandwidth.switch = 1.0e+06,
bandwidth.scale.categorical = 1.0e+04,
max.bb.eval = 10000,
min.epsilon = .Machine$double.eps,
initial.mesh.size.real = 1,
initial.mesh.size.integer = 1,
min.mesh.size.real = sqrt(.Machine$double.eps),
min.mesh.size.integer = sqrt(.Machine$double.eps),
min.poll.size.real = sqrt(.Machine$double.eps),
min.poll.size.integer = sqrt(.Machine$double.eps),
opts=list(),
restart.from.min = FALSE,
gradient.vec = NULL,
gradient.categorical = FALSE,
cv.shrink = TRUE,
cv.maxPenalty = sqrt(.Machine$double.xmax),
cv.warning = FALSE,
Bernstein = TRUE,
mpi = FALSE,
...)

Arguments
formula 
a symbolic description of the model to be fit 
data 
an optional data frame containing the variables in the model 
tydat 
a one (1) dimensional numeric or integer vector of dependent data, each
element i corresponding to each observation (row) i of

txdat 
a pvariate data frame of explanatory data (training data) used to calculate the regression estimators. Defaults to the training data used to compute the bandwidth object 
eydat 
a one (1) dimensional numeric or integer vector of the true values of the dependent variable. Optional, and used only to calculate the true errors 
exdat 
a pvariate data frame of points on which the regression will be
estimated (evaluation data). By default,
evaluation takes place on the data provided by 
bws 
a vector of bandwidths, with each element i corresponding
to the bandwidth for column i in 
degree 
integer/vector specifying the polynomial degree of the
for each dimension of the continuous 
leave.one.out 
a logical value to specify whether or not to compute the leave one
out sums. Will not work if 
ckertype 
character string used to specify the continuous kernel type.
Can be set as 
ckerorder 
numeric value specifying kernel order (one of

ukertype 
character string used to specify the unordered categorical kernel type.
Can be set as 
okertype 
character string used to specify the ordered categorical kernel type.
Can be set as 
bwtype 
character string used for the continuous variable bandwidth type,
specifying the type of bandwidth to compute and return in the

cv 
a character string (default 
cv.func 
a character string (default 
max.bb.eval 
argument passed to the NOMAD solver (see 
min.epsilon 
argument passed to the NOMAD solver (see 
initial.mesh.size.real 
argument passed to the NOMAD solver (see 
initial.mesh.size.integer 
argument passed to the NOMAD solver (see 
min.mesh.size.real 
argument passed to the NOMAD solver (see 
min.mesh.size.integer 
arguments passed to the NOMAD solver (see 
min.poll.size.real 
arguments passed to the NOMAD solver (see 
min.poll.size.integer 
arguments passed to the NOMAD solver (see 
opts 
list of optional arguments passed to the NOMAD solver
(see 
nmulti 
integer number of times to restart the process of finding extrema of
the crossvalidation function from different (random) initial
points (default 
random.seed 
when it is not missing and not equal to 0, the
initial points will be generated using this seed when using

degree.max 
the maximum degree of the polynomial for
each of the continuous predictors (default 
degree.min 
the minimum degree of the polynomial for
each of the continuous predictors (default 
bandwidth.max 
the maximum bandwidth scale (i.e. number of
scaled standard deviations) for each of the continuous predictors
(default 
bandwidth.min 
the minimum bandwidth scale for each of the
categorical predictors (default 
bandwidth.min.numeric 
the minimum bandwidth scale (i.e. number
of scaled standard deviations) for each of the continuous predictors
(default 
bandwidth.switch 
the minimum bandwidth scale (i.e. number of
scaled standard deviations) for each of the continuous predictors
(default 
bandwidth.scale.categorical 
the upper end for the rescaled
bandwidths for the categorical predictors (default

restart.from.min 
a logical value indicating to recommence

gradient.vec 
a vector corresponding to the order of the partial (or crosspartial) and which variable the partial (or crosspartial) derivative(s) are required 
gradient.categorical 
a logical value indicating whether discrete gradients (i.e. differences in the response from the base value for each categorical predictor) are to be computed 
cv.shrink 
a logical value indicating whether to use ridging
(Seifert and Gasser (2000)) for illconditioned inversion during
crossvalidation (default 
cv.maxPenalty 
a penalty applied during crossvalidation when a deleteone estimate is not finite or the polynomial basis is not of full column rank 
cv.warning 
a logical value indicating whether to issue an
immediate warning message when ill conditioned bases are encountered
during crossvalidation (default 
Bernstein 
a logical value indicating whether to use raw polynomials or Bernstein polynomials (default) (note that a Bernstein polynomial is also know as a Bezier curve which is also a Bspline with no interior knots) 
mpi 
a logical value (default 
... 
additional arguments supplied to specify the regression type, bandwidth type, kernel types, training data, and so on, detailed below 
Details
Typical usages are (see below for a list of options and also the examples at the end of this help file)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33  ## Conduct generalized local polynomial estimation
model < npglpreg(y~x1+x2)
## Conduct degree 0 local polynomial estimation
## (i.e. NadarayaWatson)
model < npglpreg(y~x1+x2,cv="bandwidth",degree=c(0,0))
## Conduct degree 1 local polynomial estimation (i.e. local linear)
model < npglpreg(y~x1+x2,cv="bandwidth",degree=c(1,1))
## Conduct degree 2 local polynomial estimation (i.e. local
## quadratic)
model < npglpreg(y~x1+x2,cv="bandwidth",degree=c(2,2))
## Plot the mean and bootstrap confidence intervals
plot(model,ci=TRUE)
## Plot the first partial derivatives and bootstrap confidence
## intervals
plot(model,deriv=1,ci=TRUE)
## Plot the first second partial derivatives and bootstrap
## confidence intervals
plot(model,deriv=2,ci=TRUE)

This function is in beta status until further notice (eventually it will be rolled into the np/npRmpi packages after the final status of snomadr/NOMAD gets sorted out).
Optimizing the crossvalidation function jointly for bandwidths
(vectors of continuous parameters) and polynomial degrees (vectors of
integer parameters) constitutes a mixedinteger optimization
problem. These problems are not only ‘hard’ from the numerical
optimization perspective, but are also computationally intensive
(contrast this to where we conduct, say, local linear regression which
sets the degree of the polynomial vector to a global value
degree=1
hence we only need to optimize with respect to the
continuous bandwidths). Because of this we must be mindful of the
presence of local optima (the objective function is nonconvex and
nondifferentiable). Restarting search from different initial starting
points is recommended (see nmulti
) and by default this is done
more than once. We encourage users to adopt ‘multistarting’ and
to investigate the impact of changing default search parameters such
as initial.mesh.size.real
, initial.mesh.size.integer
,
min.mesh.size.real
,
min.mesh.size.integer
,min.poll.size.real
, and
min.poll.size.integer
. The default values were chosen based on
extensive simulation experiments and were chosen so as to yield robust
performance while being mindful of excessive computation  of course,
no one setting can be globally optimal.
Value
npglpreg
returns a npglpreg
object. The generic
functions fitted
and residuals
extract
(or generate) estimated values and residuals. Furthermore, the
functions summary
, predict
, and
plot
(options deriv=0
, ci=FALSE
[ci=TRUE
produces pointwise bootstrap error bounds],
persp.rgl=FALSE
,
plot.behavior=c("plot","plotdata","data")
,
plot.errors.boot.num=99
,
plot.errors.type=c("quantiles","standard")
["quantiles"
produces percentiles determined by
plot.errors.quantiles
below, "standard"
produces error
bounds given by +/ 1.96 bootstrap standard deviations],
plot.errors.quantiles=c(.025,.975)
, xtrim=0.0
,
xq=0.5
) support objects of this type. The returned object has
the following components:
fitted.values 
estimates of the regression function (conditional mean) at the sample points or evaluation points 
residuals 
residuals computed at the sample points or evaluation points 
degree 
integer/vector specifying the degree of the polynomial
for each dimension of the continuous 
gradient 
the estimated gradient (vector) corresponding to the vector

gradient.categorical.mat 
the estimated gradient (matrix) for the categorical predictors 
gradient.vec 
the supplied 
bws 
vector of bandwidths 
bwtype 
the supplied 
call 
a symbolic description of the model 
r.squared 
coefficient of determination (Doksum and Samarov (1995)) 
Note
Note that the use of raw polynomials (Bernstein=FALSE
) for
approximation is appealing as they can be computed and differentiated
easily, however, they can be unstable (their inversion can be ill
conditioned) which can cause problems in some instances as the order
of the polynomial increases. This can hamper search when excessive
reliance on ridging to overcome ill conditioned inversion becomes
computationally burdensome.
npglpreg
tries to detect whether this is an issue or not when
Bernstein=FALSE
for each numeric
predictor and will
adjust the search range for snomadr
and the degree fed
to npglpreg
if appropriate.
However, if you suspect that this might be an issue for your specific
problem and you are using raw polynomials (Bernstein=FALSE
),
you are encouraged to investigate this by limiting degree.max
to value less than the default value (say 3
). Alternatively,
you might consider rescaling your numeric
predictors to lie in
[0,1] using scale
.
For a given predictor x you can readily determine if this is an issue by considering the following: Suppose x is given by
1 2 3 
so that a polynomial of order, say, 5 would be ill conditioned. This would be apparent if you considered
1 2 3 
which will throw an error when the polynomial is ill conditioned, or
1 2 3 
which will return NA
for one or more coefficients when this is
an issue.
In such cases you might consider transforming your numeric
predictors along the lines of the following:
1 2 3 4 5 
Note that now your least squares coefficients (i.e. first derivative of y with respect to x) represent the effect of a one standard deviation change in x and not a one unit change.
Alternatively, you can use Bernstein polynomials by not setting
Bernstein=FALSE
.
Author(s)
Jeffrey S. Racine racinej@mcmaster.ca and Zhenghua Nie niez@mcmaster.ca
References
Doksum, K. and A. Samarov (1995), “Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression,” The Annals of Statistics, 23 14431473.
Hall, P. and J.S. Racine (forthcoming), “CrossValidated Generalized Local Polynomial Regression,” Journal of Econometrics.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
Seifert, B. and T. Gasser (2000), “Data Adaptive Ridging in Local Polynomial Regression,” Journal of Computational and Graphical Statistics, 9(2), 338360.
See Also
npreg
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  ## Not run:
set.seed(42)
n < 100
x1 < runif(n,2,2)
x2 < runif(n,2,2)
y < x1^3 + rnorm(n,sd=1)
## Ideally the method should choose large bandwidths for x1 and x2 and a
## generalized polynomial that is a cubic for x1 and degree 0 for x2.
model < npglpreg(y~x1+x2,nmulti=1)
summary(model)
## Plot the partial means and percentile confidence intervals
plot(model,ci=T)
## Extract the data from the plot object and plot it separately
myplot.dat < plot(model,plot.behavior="data",ci=T)
matplot(myplot.dat[[1]][,1],myplot.dat[[1]][,1],type="l")
matplot(myplot.dat[[2]][,1],myplot.dat[[2]][,1],type="l")
## End(Not run)
