Generalized Local Polynomial Regression

Share:

Description

npglpreg computes a generalized local polynomial kernel regression estimate (Hall and Racine (forthcoming)) of a one (1) dimensional dependent variable on an r-dimensional vector of continuous and categorical (factor/ordered) predictors.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
npglpreg(...)

## Default S3 method:
npglpreg(tydat = NULL,
         txdat = NULL,
         eydat = NULL,
         exdat = NULL,
         bws = NULL,
         degree = NULL,
         leave.one.out = FALSE,
         ckertype = c("gaussian", "epanechnikov", "uniform", "truncated gaussian"),
         ckerorder = 2,
         ukertype = c("liracine", "aitchisonaitken"),
         okertype = c("liracine", "wangvanryzin"),
         bwtype = c("fixed", "generalized_nn", "adaptive_nn", "auto"),
         gradient.vec = NULL,
         gradient.categorical = FALSE,
         cv.shrink = TRUE,
         cv.maxPenalty = sqrt(.Machine$double.xmax),
         cv.warning = FALSE,
         Bernstein = TRUE,
         mpi = FALSE,
         ...)

## S3 method for class 'formula'
npglpreg(formula,
         data = list(),
         tydat = NULL,
         txdat = NULL,
         eydat = NULL,
         exdat = NULL,
         bws = NULL,
         degree = NULL,
         leave.one.out = FALSE,
         ckertype = c("gaussian", "epanechnikov","uniform","truncated gaussian"),
         ckerorder = 2,
         ukertype = c("liracine", "aitchisonaitken"),
         okertype = c("liracine", "wangvanryzin"),
         bwtype = c("fixed", "generalized_nn", "adaptive_nn", "auto"),
         cv = c("degree-bandwidth", "bandwidth", "none"),
         cv.func = c("cv.ls", "cv.aic"),
         nmulti = 5,
         random.seed = 42,
         degree.max = 10,
         degree.min = 0,
         bandwidth.max = .Machine$double.xmax,
         bandwidth.min = sqrt(.Machine$double.eps),
         bandwidth.min.numeric = 1.0e-02,
         bandwidth.switch = 1.0e+06,
         bandwidth.scale.categorical = 1.0e+04,
         max.bb.eval = 10000,
         min.epsilon = .Machine$double.eps,
         initial.mesh.size.real = 1,
         initial.mesh.size.integer = 1,
         min.mesh.size.real = sqrt(.Machine$double.eps),
         min.mesh.size.integer = sqrt(.Machine$double.eps),
         min.poll.size.real = sqrt(.Machine$double.eps),
         min.poll.size.integer = sqrt(.Machine$double.eps),
         opts=list(),
         restart.from.min = FALSE,
         gradient.vec = NULL,
         gradient.categorical = FALSE,
         cv.shrink = TRUE,
         cv.maxPenalty = sqrt(.Machine$double.xmax),
         cv.warning = FALSE,
         Bernstein = TRUE,
         mpi = FALSE,
         ...)

Arguments

formula

a symbolic description of the model to be fit

data

an optional data frame containing the variables in the model

tydat

a one (1) dimensional numeric or integer vector of dependent data, each element i corresponding to each observation (row) i of txdat. Defaults to the training data used to compute the bandwidth object

txdat

a p-variate data frame of explanatory data (training data) used to calculate the regression estimators. Defaults to the training data used to compute the bandwidth object

eydat

a one (1) dimensional numeric or integer vector of the true values of the dependent variable. Optional, and used only to calculate the true errors

exdat

a p-variate data frame of points on which the regression will be estimated (evaluation data). By default, evaluation takes place on the data provided by txdat

bws

a vector of bandwidths, with each element i corresponding to the bandwidth for column i in txdat

degree

integer/vector specifying the polynomial degree of the for each dimension of the continuous x in txdat

leave.one.out

a logical value to specify whether or not to compute the leave one out sums. Will not work if exdat is specified. Defaults to FALSE

ckertype

character string used to specify the continuous kernel type. Can be set as gaussian, epanechnikov, or uniform. Defaults to gaussian.

ckerorder

numeric value specifying kernel order (one of (2,4,6,8)). Kernel order specified along with a uniform continuous kernel type will be ignored. Defaults to 2.

ukertype

character string used to specify the unordered categorical kernel type. Can be set as aitchisonaitken or liracine. Defaults to liracine

okertype

character string used to specify the ordered categorical kernel type. Can be set as wangvanryzin or liracine. Defaults to liracine

bwtype

character string used for the continuous variable bandwidth type, specifying the type of bandwidth to compute and return in the bandwidth object. If bwtype="auto", the bandwidth type type will be automatically determined by cross-validation. Defaults to fixed. Option summary:
fixed: compute fixed bandwidths
generalized_nn: compute generalized nearest neighbor bandwidths
adaptive_nn: compute adaptive nearest neighbor bandwidths

cv

a character string (default cv="nomad") indicating whether to use nonsmooth mesh adaptive direct search, or no search (i.e. use supplied values for degree and bws)

cv.func

a character string (default cv.func="cv.ls") indicating which method to use to select smoothing parameters. cv.aic specifies expected Kullback-Leibler cross-validation (Hurvich, Simonoff, and Tsai (1998)), and cv.ls specifies least-squares cross-validation

max.bb.eval

argument passed to the NOMAD solver (see snomadr for further details)

min.epsilon

argument passed to the NOMAD solver (see snomadr for further details)

initial.mesh.size.real

argument passed to the NOMAD solver (see snomadr for further details)

initial.mesh.size.integer

argument passed to the NOMAD solver (see snomadr for further details)

min.mesh.size.real

argument passed to the NOMAD solver (see snomadr for further details)

min.mesh.size.integer

arguments passed to the NOMAD solver (see snomadr for further details)

min.poll.size.real

arguments passed to the NOMAD solver (see snomadr for further details)

min.poll.size.integer

arguments passed to the NOMAD solver (see snomadr for further details)

opts

list of optional arguments passed to the NOMAD solver (see snomadr for further details)

nmulti

integer number of times to restart the process of finding extrema of the cross-validation function from different (random) initial points (default nmulti=5)

random.seed

when it is not missing and not equal to 0, the initial points will be generated using this seed when using snomadr

degree.max

the maximum degree of the polynomial for each of the continuous predictors (default degree.max=10)

degree.min

the minimum degree of the polynomial for each of the continuous predictors (default degree.min=0)

bandwidth.max

the maximum bandwidth scale (i.e. number of scaled standard deviations) for each of the continuous predictors (default bandwidth.max=.Machine$double.xmax)

bandwidth.min

the minimum bandwidth scale for each of the categorical predictors (default sqrt(.Machine$double.eps))

bandwidth.min.numeric

the minimum bandwidth scale (i.e. number of scaled standard deviations) for each of the continuous predictors (default bandwidth.min=1.0e-02)

bandwidth.switch

the minimum bandwidth scale (i.e. number of scaled standard deviations) for each of the continuous predictors (default bandwidth.switch=1.0e+06) before the local polynomial is treated as global during cross-validation at which point a global categorical kernel weighted least squares fit is used for computational efficiency

bandwidth.scale.categorical

the upper end for the rescaled bandwidths for the categorical predictors (default bandwidth.scale.categorical=1.0e+04) - the aim is to ‘even up’ the scale of the search parameters as much as possible, so when very large scale factors are selected for the continuous predictors, a larger value may improve search

restart.from.min

a logical value indicating to recommence snomadr with the optimal values found from its first invocation (typically quick but sometimes recommended in the field of optimization)

gradient.vec

a vector corresponding to the order of the partial (or cross-partial) and which variable the partial (or cross-partial) derivative(s) are required

gradient.categorical

a logical value indicating whether discrete gradients (i.e. differences in the response from the base value for each categorical predictor) are to be computed

cv.shrink

a logical value indicating whether to use ridging (Seifert and Gasser (2000)) for ill-conditioned inversion during cross-validation (default cv.shrink=TRUE) or to instead test for ill-conditioned matrices and penalize heavily when this is the case (much stronger condition imposed on cross-validation)

cv.maxPenalty

a penalty applied during cross-validation when a delete-one estimate is not finite or the polynomial basis is not of full column rank

cv.warning

a logical value indicating whether to issue an immediate warning message when ill conditioned bases are encountered during cross-validation (default cv.warning=FALSE)

Bernstein

a logical value indicating whether to use raw polynomials or Bernstein polynomials (default) (note that a Bernstein polynomial is also know as a Bezier curve which is also a B-spline with no interior knots)

mpi

a logical value (default mpi=FALSE) that, when mpi=TRUE, can call the npRmpi rather than the np package (note - code needs to mirror examples in the demo directory of the npRmpi package, you need to broadcast loading of the crs package, and need .Rprofile in your current directory)

...

additional arguments supplied to specify the regression type, bandwidth type, kernel types, training data, and so on, detailed below

Details

Typical usages are (see below for a list of options and also the examples at the end of this help file)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
    ## Conduct generalized local polynomial estimation
    
    model <- npglpreg(y~x1+x2)
    
    ## Conduct degree 0 local polynomial estimation
    ## (i.e. Nadaraya-Watson)
    
    model <- npglpreg(y~x1+x2,cv="bandwidth",degree=c(0,0))    
    
    ## Conduct degree 1 local polynomial estimation (i.e. local linear)
    
    model <- npglpreg(y~x1+x2,cv="bandwidth",degree=c(1,1))    
    
    ## Conduct degree 2 local polynomial estimation (i.e. local
    ## quadratic)
    
    model <- npglpreg(y~x1+x2,cv="bandwidth",degree=c(2,2))

    ## Plot the mean and bootstrap confidence intervals

    plot(model,ci=TRUE)

    ## Plot the first partial derivatives and bootstrap confidence
    ## intervals

    plot(model,deriv=1,ci=TRUE)

    ## Plot the first second partial derivatives and bootstrap
    ## confidence intervals

    plot(model,deriv=2,ci=TRUE)        
    
  

This function is in beta status until further notice (eventually it will be rolled into the np/npRmpi packages after the final status of snomadr/NOMAD gets sorted out).

Optimizing the cross-validation function jointly for bandwidths (vectors of continuous parameters) and polynomial degrees (vectors of integer parameters) constitutes a mixed-integer optimization problem. These problems are not only ‘hard’ from the numerical optimization perspective, but are also computationally intensive (contrast this to where we conduct, say, local linear regression which sets the degree of the polynomial vector to a global value degree=1 hence we only need to optimize with respect to the continuous bandwidths). Because of this we must be mindful of the presence of local optima (the objective function is non-convex and non-differentiable). Restarting search from different initial starting points is recommended (see nmulti) and by default this is done more than once. We encourage users to adopt ‘multistarting’ and to investigate the impact of changing default search parameters such as initial.mesh.size.real, initial.mesh.size.integer, min.mesh.size.real, min.mesh.size.integer,min.poll.size.real, and min.poll.size.integer. The default values were chosen based on extensive simulation experiments and were chosen so as to yield robust performance while being mindful of excessive computation - of course, no one setting can be globally optimal.

Value

npglpreg returns a npglpreg object. The generic functions fitted and residuals extract (or generate) estimated values and residuals. Furthermore, the functions summary, predict, and plot (options deriv=0, ci=FALSE [ci=TRUE produces pointwise bootstrap error bounds], persp.rgl=FALSE, plot.behavior=c("plot","plot-data","data"), plot.errors.boot.num=99, plot.errors.type=c("quantiles","standard") ["quantiles" produces percentiles determined by plot.errors.quantiles below, "standard" produces error bounds given by +/- 1.96 bootstrap standard deviations], plot.errors.quantiles=c(.025,.975), xtrim=0.0, xq=0.5) support objects of this type. The returned object has the following components:

fitted.values

estimates of the regression function (conditional mean) at the sample points or evaluation points

residuals

residuals computed at the sample points or evaluation points

degree

integer/vector specifying the degree of the polynomial for each dimension of the continuous x

gradient

the estimated gradient (vector) corresponding to the vector gradient.vec

gradient.categorical.mat

the estimated gradient (matrix) for the categorical predictors

gradient.vec

the supplied gradient.vec

bws

vector of bandwidths

bwtype

the supplied bwtype

call

a symbolic description of the model

r.squared

coefficient of determination (Doksum and Samarov (1995))

Note

Note that the use of raw polynomials (Bernstein=FALSE) for approximation is appealing as they can be computed and differentiated easily, however, they can be unstable (their inversion can be ill conditioned) which can cause problems in some instances as the order of the polynomial increases. This can hamper search when excessive reliance on ridging to overcome ill conditioned inversion becomes computationally burdensome.

npglpreg tries to detect whether this is an issue or not when Bernstein=FALSE for each numeric predictor and will adjust the search range for snomadr and the degree fed to npglpreg if appropriate.

However, if you suspect that this might be an issue for your specific problem and you are using raw polynomials (Bernstein=FALSE), you are encouraged to investigate this by limiting degree.max to value less than the default value (say 3). Alternatively, you might consider re-scaling your numeric predictors to lie in [0,1] using scale.

For a given predictor x you can readily determine if this is an issue by considering the following: Suppose x is given by

1
2
3
    x <- runif(100,10000,11000)
    y <- x + rnorm(100,sd=1000)
  

so that a polynomial of order, say, 5 would be ill conditioned. This would be apparent if you considered

1
2
3
    X <- poly(x,degree=5,raw=TRUE)
    solve(t(X)%*%X)
  

which will throw an error when the polynomial is ill conditioned, or

1
2
3
    X <- poly(x,degree=5,raw=TRUE)
    lm(y~X)
  

which will return NA for one or more coefficients when this is an issue.

In such cases you might consider transforming your numeric predictors along the lines of the following:

1
2
3
4
5
    x <- as.numeric(scale(x))
    X <- poly(x,degree=5,raw=TRUE)
    solve(t(X)%*%X)
    lm(y~X)    
  

Note that now your least squares coefficients (i.e. first derivative of y with respect to x) represent the effect of a one standard deviation change in x and not a one unit change.

Alternatively, you can use Bernstein polynomials by not setting Bernstein=FALSE.

Author(s)

Jeffrey S. Racine racinej@mcmaster.ca and Zhenghua Nie niez@mcmaster.ca

References

Doksum, K. and A. Samarov (1995), “Nonparametric Estimation of Global Functionals and a Measure of the Explanatory Power of Covariates in Regression,” The Annals of Statistics, 23 1443-1473.

Hall, P. and J.S. Racine (forthcoming), “Cross-Validated Generalized Local Polynomial Regression,” Journal of Econometrics.

Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Seifert, B. and T. Gasser (2000), “Data Adaptive Ridging in Local Polynomial Regression,” Journal of Computational and Graphical Statistics, 9(2), 338-360.

See Also

npreg

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Not run: 
set.seed(42)
n <- 100
x1 <- runif(n,-2,2)
x2 <- runif(n,-2,2)
y <- x1^3 + rnorm(n,sd=1)

## Ideally the method should choose large bandwidths for x1 and x2 and a
## generalized polynomial that is a cubic for x1 and degree 0 for x2.

model <- npglpreg(y~x1+x2,nmulti=1)
summary(model)

## Plot the partial means and percentile confidence intervals
plot(model,ci=T)
## Extract the data from the plot object and plot it separately
myplot.dat <- plot(model,plot.behavior="data",ci=T)
matplot(myplot.dat[[1]][,1],myplot.dat[[1]][,-1],type="l")
matplot(myplot.dat[[2]][,1],myplot.dat[[2]][,-1],type="l")

## End(Not run)