np.smoothcoef: Smooth Coefficient Kernel Regression

npscoefR Documentation

Smooth Coefficient Kernel Regression

Description

npscoef computes a kernel regression estimate of a one (1) dimensional dependent variable on p-variate explanatory data, using the model Y_i = W_{i}^{\prime} \gamma (Z_i) + u_i where W_i'=(1,X_i'), given a set of evaluation points, training points (consisting of explanatory data and dependent data), and a bandwidth specification. A bandwidth specification can be a scbandwidth object, or a bandwidth vector, bandwidth type and kernel type.

Usage

npscoef(bws, ...)

## S3 method for class 'formula'
npscoef(bws, 
        data = NULL, 
        newdata = NULL, 
        y.eval = FALSE, 
        ...)

## Default S3 method:
npscoef(bws, 
        txdat, 
        tydat, 
        tzdat, 
        nomad = FALSE, 
        ...)

## S3 method for class 'scbandwidth'
npscoef(bws,
        txdat = stop("training data 'txdat' missing"),
        tydat = stop("training data 'tydat' missing"),
        tzdat = NULL,
        exdat,
        eydat,
        ezdat,
        betas = FALSE,
        errors = TRUE,
        iterate = TRUE,
        leave.one.out = FALSE,
        maxiter = 100,
        residuals = FALSE,
        tol = .Machine$double.eps,
        ...)

Arguments

Data, Bandwidth Inputs And Formula Interface

These arguments identify the bandwidth specification, formula/data interface, and smooth-coefficient training data.

bws

a bandwidth specification. This can be set as a scbandwidth object returned from an invocation of npscoefbw, or as a vector of bandwidths, with each element i corresponding to the bandwidth for column i in tzdat. If specified as a vector additional arguments will need to be supplied as necessary to specify the bandwidth type, kernel types, training data, and so on.

data

an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model. If not found in data, the variables are taken from environment(bws), typically the environment from which npscoefbw was called.

txdat

a p-variate data frame of explanatory data (training data), which, by default, populates the columns 2 through p+1 of W in the model equation, and in the absence of zdat, will also correspond to Z from the model equation. Defaults to the training data used to compute the bandwidth object.

tydat

a one (1) dimensional numeric or integer vector of dependent data, each element i corresponding to each observation (row) i of txdat. Defaults to the training data used to compute the bandwidth object.

tzdat

an optionally specified q-variate data frame of explanatory data (training data), which corresponds to Z in the model equation. Defaults to the training data used to compute the bandwidth object.

Local-Polynomial Degree And Bandwidth Search

This argument controls the recommended automatic local-polynomial NOMAD route, which jointly selects continuous polynomial degree and bandwidths when these are computed inside npscoef.

nomad

logical shortcut passed through to npscoefbw when bandwidths are computed inside npscoef. When TRUE, the smooth-coefficient bandwidth route fills any missing values among regtype, search.engine, degree.select, bernstein.basis, degree.min, degree.max, degree.verify, and bwtype with the recommended automatic local-polynomial degree-and-bandwidth NOMAD preset documented in npscoefbw. Additional NOMAD tuning arguments such as nomad.nmulti may also be supplied through ...; nmulti remains the outer restart count while nomad.nmulti controls inner crs::snomadr() multistarts within each outer restart. After fitting, inspect fit$bws$nomad.shortcut on the returned object fit to see the normalized shortcut metadata.

Evaluation Data And Returned Quantities

These arguments control where the smooth-coefficient fit is evaluated and which evaluation quantities are returned.

exdat

a p-variate data frame of points on which the regression will be estimated (evaluation data).By default, evaluation takes place on the data provided by txdat.

eydat

a one (1) dimensional numeric or integer vector of the true values of the dependent variable. Optional, and used only to calculate the true errors.

ezdat

an optionally specified q-variate data frame of points on which the regression will be estimated (evaluation data), which corresponds to Z in the model equation. Defaults to be the same as txdat.

newdata

An optional data frame in which to look for evaluation data. If omitted, the training data are used.

y.eval

If newdata contains dependent data and y.eval = TRUE, np will compute goodness of fit statistics on these data and return them. Defaults to FALSE.

Fitted Quantities And Backfitting

These arguments control returned coefficient estimates, errors, residuals, and iterative backfitting.

betas

a logical value indicating whether or not estimates of the components of \gamma should be returned in the smoothcoefficient object along with the regression estimates. Defaults to FALSE.

errors

a logical value indicating whether or not asymptotic standard errors should be computed and returned in the resulting smoothcoefficient object. Defaults to TRUE.

iterate

a logical value indicating whether or not backfitted estimates should be iterated for self-consistency. Defaults to TRUE.

leave.one.out

a logical value to specify whether or not to compute the leave one out estimates. Will not work if e[xyz]dat is specified. Defaults to FALSE.

maxiter

integer specifying the maximum number of times to iterate the backfitted estimates while attempting to make the backfitted estimates converge to the desired tolerance. Defaults to 100.

residuals

a logical value indicating that you want residuals computed and returned in the resulting smoothcoefficient object. Defaults to FALSE.

tol

desired tolerance on the relative convergence of backfit estimates. Defaults to .Machine$double.eps.

Additional Arguments

Further arguments are passed to the bandwidth-selection counterpart when bandwidths are not supplied.

...

additional arguments supplied to specify the regression type, bandwidth type, kernel types, selection methods, and so on. To do this, you may specify any of bwscaling, bwtype, ckertype, ckerorder, as described in npscoefbw.

Value

npscoef returns a smoothcoefficient object. The generic functions fitted, residuals, coef, se, and predict, extract (or generate) estimated values, residuals, coefficients, bootstrapped standard errors on estimates, and predictions, respectively, from the returned object. Furthermore, the functions summary and plot support objects of this type. The returned object has the following components:

eval

evaluation points

mean

estimation of the regression function (conditional mean) at the evaluation points

merr

if errors = TRUE, standard errors of the regression estimates

beta

if betas = TRUE, estimates of the coefficients \gamma at the evaluation points

grad

estimated derivatives of the conditional mean with respect to the regressors in xdat; these correspond to the non-intercept smooth coefficient estimates at each evaluation point

gerr

if errors = TRUE, asymptotic standard errors for grad

resid

if residuals = TRUE, in-sample or out-of-sample residuals where appropriate (or possible)

R2

coefficient of determination (Doksum and Samarov (1995))

MSE

mean squared error

MAE

mean absolute error

MAPE

mean absolute percentage error

CORR

absolute value of Pearson's correlation coefficient

SIGN

fraction of observations where fitted and observed values agree in sign

Book And Method Pointers

The smooth-coefficient model lets slopes vary with conditioning variables, typically written Y=X^\prime\beta(Z)+\epsilon. The functions estimate the coefficient functions \beta_j(z) using mixed-data kernel smoothing over Z, with the reported fitted values obtained by combining the estimated coefficient functions with the corresponding columns of X.

For book-length derivations, see Li and Racine (2007), Chapter 9 Additive and Smooth (Varying) Coefficient Semiparametric Models, especially Sections 9.3-9.3.4, and Racine (2019), Chapter 8 Semiparametric Conditional Mean Function Estimation, especially the varying-coefficient material.

Usage Issues

If you are using data of mixed types, then it is advisable to use the data.frame function to construct your input data and not cbind, since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.

For practitioners who want the recommended automatic local-polynomial degree-and-bandwidth NOMAD route without spelling out all LP tuning arguments, npscoef(..., nomad=TRUE) and npscoefbw(..., nomad=TRUE) expand missing settings to the same documented preset. Explicit incompatible settings fail fast rather than being silently rewritten.

For plotting options for fitted smooth-coefficient objects and their bandwidth objects, see plot.np.

Support for backfitted bandwidths is experimental and is limited in functionality. The code does not support asymptotic standard errors or out of sample estimates with backfitting.

Author(s)

Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca

References

Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.

Cai Z. (2007), “Trending time-varying coefficient time series models with serially correlated errors,” Journal of Econometrics, 136, 163-188.

Doksum, K. and A. Samarov (1995), “Nonparametric estimation of global functionals and a measure of the explanatory power of covariates in regression,” The Annals of Statistics, 23 1443-1473.

Hastie, T. and R. Tibshirani (1993), “Varying-coefficient models,” Journal of the Royal Statistical Society, B 55, 757-796.

Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Li, Q. and J.S. Racine (2010), “Smooth varying-coefficient estimation and inference for qualitative and quantitative data,” Econometric Theory, 26, 1-31.

Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.

Li, Q. and D. Ouyang and J.S. Racine (2013), “Categorical semiparametric varying-coefficient models,” Journal of Applied Econometrics, 28, 551-589.

Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.

See Also

np.kernels, np.options, plot, plot.np, bw.nrd, bw.SJ, hist, npudens, npudist, npudensbw, npscoefbw

Examples

## Not run: 
# EXAMPLE 1 (INTERFACE=FORMULA):

n <- 250
x <- runif(n)
z <- runif(n, min=-2, max=2)
y <- x*exp(z)*(1.0+rnorm(n,sd = 0.2))
bw <- npscoefbw(y~x|z)
model <- npscoef(bw)
if (interactive()) plot(model)

# EXAMPLE 1 (INTERFACE=DATA FRAME):

n <- 250
x <- runif(n)
z <- runif(n, min=-2, max=2)
y <- x*exp(z)*(1.0+rnorm(n,sd = 0.2))
bw <- npscoefbw(xdat=x, ydat=y, zdat=z)
model <- npscoef(bw)
if (interactive()) plot(model)

## End(Not run) 

np documentation built on May 16, 2026, 1:07 a.m.