np.qregression: Kernel Quantile Regression with Mixed Data Types

npqregR Documentation

Kernel Quantile Regression with Mixed Data Types

Description

npqreg computes a kernel quantile regression estimate of a one (1) dimensional dependent variable on p-variate explanatory data, given a set of evaluation points, training points (consisting of explanatory data and dependent data), and a bandwidth specification using the methods of Li and Racine (2008) and Li, Lin and Racine (2013). A bandwidth specification can be a condbandwidth object, or a bandwidth vector, bandwidth type and kernel type.

Usage

npqreg(bws, ...)

## S3 method for class 'formula'
npqreg(bws, data = NULL, newdata = NULL, ...)


## S3 method for class 'condbandwidth'
npqreg(bws,
       txdat = stop("training data 'txdat' missing"),
       tydat = stop("training data 'tydat' missing"),
       exdat,
       tau = 0.5,
       gradients = FALSE,
       tol = 1.490116e-04,
       small = 1.490116e-05,
       itmax = 10000,
       ...)

## Default S3 method:
npqreg(bws, txdat, tydat, nomad = FALSE, ...)

## S3 method for class 'qregression'
predict(object, se.fit = FALSE, ...)

## S3 method for class 'qregression'
plot(x, ...)

Arguments

Data, Bandwidth Inputs And Formula Interface

These arguments identify the bandwidth specification, formula/data interface, and training data.

bws

a bandwidth specification. This can be set as a condbandwidth object returned from an invocation of npcdistbw, or as a vector of bandwidths, with each element i corresponding to the bandwidth for column i in txdat. If specified as a vector, then additional arguments will need to be supplied as necessary to specify the bandwidth type, kernel types, and so on.

data

an optional data frame, list or environment (or object coercible to a data frame by as.data.frame) containing the variables in the model. If not found in data, the variables are taken from environment(bws), typically the environment from which npcdistbw was called.

txdat

a p-variate data frame of explanatory data (training data) used to calculate the regression estimators. Defaults to the training data used to compute the bandwidth object.

tydat

a one (1) dimensional numeric or integer vector of dependent data, each element i corresponding to each observation (row) i of txdat. Defaults to the training data used to compute the bandwidth object.

object

an object of class "qregression" returned by npqreg.

x

an object of class "qregression" returned by npqreg.

Local-Polynomial Degree And Bandwidth Search

This argument controls the recommended automatic local-polynomial NOMAD route, which jointly selects continuous polynomial degree and bandwidths when conditional-distribution bandwidths are computed inside npqreg.

nomad

logical shortcut passed through to npcdistbw when bandwidths are computed inside npqreg. When TRUE, the conditional-distribution bandwidth route fills any missing values among regtype, search.engine, degree.select, bernstein.basis, degree.min, degree.max, degree.verify, and bwtype with the recommended automatic local-polynomial degree-and-bandwidth NOMAD preset documented in npcdistbw. Additional NOMAD tuning arguments such as nomad.nmulti may also be supplied through ...; nmulti remains the outer restart count while nomad.nmulti controls inner crs::snomadr() multistarts within each outer restart. After fitting, inspect fit$bws$nomad.shortcut on the returned object fit to see the normalized shortcut metadata.

Evaluation Data And Returned Quantities

These arguments control where the quantile regression is evaluated and which fitted quantities are returned.

exdat

a p-variate data frame of points on which the regression will be estimated (evaluation data). By default, evaluation takes place on the data provided by txdat.

gradients

a logical value indicating that you want gradients of the conditional quantile with respect to the conditioning variables computed and returned in the resulting qregression object. Defaults to FALSE.

newdata

An optional data frame in which to look for evaluation data. If omitted, the training data are used.

se.fit

logical value. If TRUE, predict.qregression returns a list with components fit and se.fit; otherwise it returns fitted conditional quantiles.

tau

a numeric scalar or vector specifying the quantile probability or probabilities \tau. Defaults to 0.5.

Quantile Solver Controls

These arguments control the one-dimensional numerical quantile extraction step.

itmax

integer maximum number of iterations allowed in the one-dimensional quantile refinement. Defaults to 10000.

small

minimum interval width used by the one-dimensional quantile refinement. Defaults to 1.490116e-05 (approximately 1000*sqrt(.Machine$double.eps)).

tol

tolerance on the one-dimensional quantile location refinement. Defaults to 1.490116e-04 (approximately 10000*sqrt(.Machine$double.eps)).

Additional Arguments

Further arguments are passed to the bandwidth-selection counterpart, prediction/evaluation route, or plot route as appropriate.

...

additional arguments supplied to npcdistbw when npqreg computes bandwidths internally, or arguments needed to interpret a numeric bws vector. This is where bandwidth selection controls such as bwmethod, bwtype, and bwscaling, kernel/support controls such as cxkertype, cykertype, cxkerorder, cykerorder, cxkerbound, and cykerbound, categorical kernel controls such as uxkertype, uykertype, oxkertype, and oykertype, search controls such as nmulti and scale.factor.search.lower, and local-polynomial/NOMAD controls such as regtype, degree, bernstein.basis, degree.select, and nomad.nmulti are supplied. In predict.qregression, additional arguments are passed to npqreg for evaluation with the stored bandwidth object; common examples are newdata, native exdat, and tau. In plot.qregression, additional arguments are passed through the package plot route; common controls include tau, gradients, output, legend, and graphics arguments. See npcdistbw and plot.np for the complete bandwidth-selection and plotting argument surfaces.

Details

Documentation guide: see np.kernels for kernels, np.options for global options, and plot, plot.np for plotting options.

Given a conditional distribution bandwidth object, npqreg estimates the conditional distribution function F(y|x) and extracts the requested conditional quantile. For 0 < \tau < 1, the conditional quantile at probability \tau is

q_\tau(x) = \inf\{y : F(y|x) \ge \tau\}.

Equivalently, q_\tau(x) is a quasi-inverse of the conditional distribution in the sense of Nelsen (2006): an inverse agrees with F on the range of F, while outside that range the generalized inverse is defined by the lower endpoint at which F reaches or exceeds the requested probability. Numerically, npqreg inverts the selected conditional distribution estimator represented by bws. This includes the selected bandwidth type, kernels, local-polynomial regression type, selected polynomial degree, basis, and Bernstein-basis setting inherited from npcdistbw. If the bandwidth object was selected with nomad=TRUE, the returned conditional-distribution bandwidth object is an LP object: its regtype/regtype.engine metadata identify the selected local-polynomial route and its degree/degree.engine metadata record the selected continuous-coordinate polynomial degree. npqreg, predict, and plot reuse this stored LP metadata; plotting additional tau values does not recompute or downgrade the selected degree. The one-dimensional inversion is carried out over the observed support of the dependent variable using the same selected conditional CDF estimator that is later used for quantile standard errors and gradients. The arguments tol, small, and itmax control this one-dimensional refinement.

Let f(y|x) = \partial F(y|x)/\partial y denote the conditional density. The asymptotic standard error of the conditional quantile is computed by the first-order delta method,

se\{\hat q_\tau(x)\} = \frac{se\{\hat F(\hat q_\tau(x)|x)\}} {\hat f(\hat q_\tau(x)|x)} ,

using the selected conditional distribution standard-error machinery and the selected conditional density evaluated at the fitted quantile. This corresponds to the quantile variance expression in Li, Lin and Racine (2013).

If gradients=TRUE, npqreg also computes gradients of the conditional quantile with respect to the conditioning variables for which gradients are defined. Differentiating F(q_\tau(x)|x) = \tau gives

\nabla_x q_\tau(x) = -\frac{F_x(q_\tau(x)|x)}{f(q_\tau(x)|x)},

where F_x(y|x) is the derivative of the same selected conditional distribution estimator with respect to x. For regtype="lc", this uses the local-constant conditional-gradient machinery; for regtype="ll" it uses the canonical local-polynomial degree-one route; and for regtype="lp" it uses the selected or supplied degree vector. The corresponding first-order gradient standard errors are computed componentwise as

se\{\nabla_x \hat q_\tau(x)\} = \frac{se\{\hat F_x(\hat q_\tau(x)|x)\}} {\hat f(\hat q_\tau(x)|x)} .

When npqreg is called without an explicit bws object, it first computes conditional distribution bandwidths using npcdistbw and stores them in the returned object's bws component. If a scalar tau was used initially and additional quantiles are later desired as fitted objects, reuse those selected bandwidths directly, for example npqreg(bws = fit$bws, tau = c(0.25, 0.5, 0.75)). If the goal is only to inspect additional quantiles graphically, use plot(fit, tau = c(0.25, 0.5, 0.75)); this reuses the stored bandwidths and recomputes only the one-dimensional quantile extraction step for the requested tau values. Vector-tau plots are overlaid and include a legend; use legend=FALSE, legend=NULL, or a legend=list(...) control to suppress or customize it.

The predict method follows the usual S3 newdata convention. For formula fits, supply a data frame of evaluation covariates via predict(fit, newdata=...). For non-formula fits, newdata is translated to the native evaluator argument exdat when exdat is not supplied. The native exdat argument remains available for advanced workflows and takes precedence if both newdata and exdat are supplied. If tau is omitted in predict, the fitted object's stored tau value is used.

Value

npqreg returns a qregression object. The generic functions fitted (or quantile), se, predict, and gradients extract (or generate) estimated values, asymptotic standard errors on estimates, predictions, and gradients, respectively, from the returned object. predict uses the object's stored tau value by default; supply tau= to override it. Furthermore, the functions summary and plot support objects of this type. The returned object has the following components:

eval

evaluation points

quantile

estimation of the quantile regression function (conditional quantile) at the evaluation points. If tau has length greater than one this is an evaluation-by-tau matrix.

quanterr

asymptotic standard errors of the quantile regression estimates, obtained from the conditional distribution standard error and the estimated conditional density at the fitted quantile. If tau has length greater than one this is an evaluation-by-tau matrix.

quantgrad

gradients of the conditional quantile with respect to the conditioning variables at each evaluation point, when gradients=TRUE. If tau has length greater than one this is an evaluation-by-gradient-by-tau array.

quantgerr

asymptotic standard errors for gradients, when gradients=TRUE. If tau has length greater than one this is an evaluation-by-gradient-by-tau array.

tau

the quantile probability or probabilities computed

Book And Method Pointers

The conditional quantile target is the generalized inverse q_\tau(x)=\inf\{y:F(y\mid x)\ge \tau\} of the conditional distribution. The standard errors and gradients described above are first-order delta-method quantities evaluated using the same selected conditional CDF, conditional density, bandwidths, kernels, and local-polynomial degree inherited from the supplied npcdistbw object.

For book-length derivations, see Li and Racine (2007), Chapter 6 Conditional CDF and Quantile Estimation, especially Sections 6.3-6.5, and Racine (2019), Chapter 4 Conditional Probability Density and Cumulative Distribution Functions. The quasi-inverse terminology follows Nelsen (2006).

Usage Issues

If you are using data of mixed types, then it is advisable to use the data.frame function to construct your input data and not cbind, since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.

Author(s)

Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca

References

Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.

Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 1015-1026.

Koenker, R. W. and G.W. Bassett (1978), “Regression quantiles,” Econometrica, 46, 33-50.

Koenker, R. (2005), Quantile Regression, Econometric Society Monograph Series, Cambridge University Press.

Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Li, Q. and J.S. Racine (2008), “Nonparametric estimation of conditional CDF and quantile functions with mixed categorical and continuous data,” Journal of Business and Economic Statistics, 26, 423-434.

Li, Q. and J. Lin and J.S. Racine (2013), “Optimal Bandwidth Selection for Nonparametric Conditional Distribution and Quantile Functions”, Journal of Business and Economic Statistics, 31, 57-65.

Nelsen, R.B. (2006), An Introduction to Copulas, Second Edition, Springer.

Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.

See Also

np.kernels, np.options, plot, plot.np, quantreg

Examples

## Not run: 
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we compute a
# bivariate nonparametric quantile regression estimate for Giovanni
# Baiocchi's Italian income panel (see Italy for details)

data("Italy")
with(Italy, {

# Compute conditional distribution bandwidths and extract three
# conditional quantiles using the same selected bandwidths.

model.q <- npqreg(gdp~ordered(year), tau=c(0.25, 0.50, 0.75))

# Plot the overlaid quantiles.

plot(model.q)

# If a scalar tau was used first, additional quantiles can reuse the
# selected bandwidths without recomputing cross-validation. Use npqreg()
# when the additional fitted values are needed as an object, or plot()
# when graphical inspection is all that is desired.

model.med <- npqreg(gdp~ordered(year), tau=0.50)
model.q <- npqreg(bws=model.med$bws, tau=c(0.25, 0.50, 0.75))
plot(model.med, tau=c(0.25, 0.50, 0.75))

})

# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we compute a
# bivariate nonparametric quantile regression estimate for Giovanni
# Baiocchi's Italian income panel (see Italy for details)

data("Italy")
with(Italy, {
data <- data.frame(ordered(year), gdp)

# First, compute the likelihood cross-validation bandwidths (default).
# Note - this may take a few minutes depending on the speed of your
# computer...

bw <- npcdistbw(xdat=ordered(year), ydat=gdp)

# Note - numerical search for computing the quantiles will take a
# minute or so...

model.q <- npqreg(bws=bw, tau=c(0.25, 0.50, 0.75))

plot(model.q)

})

## End(Not run) 

np documentation built on May 16, 2026, 1:07 a.m.