nplsqreg: Location-Scale Kernel Quantile Regression with Mixed Data...

View source: R/np.lsqregression.R

nplsqregR Documentation

Location-Scale Kernel Quantile Regression with Mixed Data Types

Description

nplsqreg computes a location-scale kernel estimate of the conditional quantile function for a one dimensional dependent variable and mixed continuous, unordered factor, and ordered factor explanatory data. Unlike npqreg, which obtains conditional quantiles by inverting an estimated conditional distribution, nplsqreg estimates the requested conditional quantile surface directly using a locally weighted quantile-kernel construction.

Usage

nplsqreg(bws, ...)

## S3 method for class 'formula'
nplsqreg(bws, data = NULL, newdata = NULL, tau = 0.5,
       gradients = FALSE, residuals = FALSE, subset, na.action, ...)

## S3 method for class 'lsqregressionbandwidth'
nplsqreg(bws, txdat = NULL, tydat = NULL,
       tau = bws$tau, ...)

## Default S3 method:
nplsqreg(bws,
       txdat = stop("training data 'txdat' missing"),
       tydat = stop("training data 'tydat' missing"),
       tau = 0.5,
       exdat,
       gradients = FALSE,
       residuals = FALSE,
       ...)

Arguments

Data, Bandwidth Inputs And Formula Interface

These arguments identify the bandwidth specification, formula/data interface, and training data.

bws

a formula, an lsqregressionbandwidth object returned by nplsqregbw, an rbandwidth object, a numeric bandwidth vector, or omitted for automatic bandwidth selection. Exact nplsqreg reuse is through the fitted object's $bws component; $reg.bws is internal regression state.

data

an optional data frame, list or environment containing the variables in the model. If not found in data, the variables are taken from environment(bws).

subset

an optional vector specifying a subset of observations to be used by the formula method.

na.action

a function specifying the action to take when missing values are found by the formula method.

txdat

a p-variate data frame of explanatory data used as training data. Defaults to the training data stored in bws.

tydat

a one dimensional numeric vector of dependent data. Defaults to the training response stored in bws.

Evaluation Data And Returned Quantities

These arguments control where the quantile regression is evaluated and which fitted quantities are returned.

newdata

an optional data frame in which to look for evaluation covariates for formula fits. If omitted, the training data are used.

exdat

a p-variate data frame of evaluation points. By default, evaluation takes place on txdat. The native exdat argument takes precedence over newdata when both are supplied.

gradients

a logical value indicating whether gradients and categorical effects of the conditional quantile with respect to the conditioning variables should be computed and returned. Defaults to FALSE.

residuals

a logical value indicating whether residuals should be returned for training-data fits. Defaults to FALSE.

Quantile Index And Additional Controls

These arguments control the requested quantile probability and the bandwidth selection, prediction, or plotting route.

tau

a numeric scalar or vector specifying the quantile probability or probabilities \tau. Values must lie strictly in (0,1).

...

additional arguments supplied to nplsqregbw when bandwidths are computed internally, to npreg for the final transformed-response fit, or to plotting and prediction methods as appropriate. Common bandwidth-selection controls include regtype, bwtype, nmulti, degree, nomad, search.engine, tau.search, delta, scale, regtype.pilot, nomad.pilot, and pilot.args.

Details

The estimator follows the locally weighted quantile-kernel approach of Racine and Li (2017). Given a conditional scale pilot \hat\sigma(X_i), define

Y_i^\delta = Y_i + \hat\sigma(X_i)\Phi^{-1}(\delta),

where 0 < \delta < 1 and \Phi^{-1} is the standard normal quantile function. For a requested quantile probability \tau, nplsqregbw selects the bandwidths and \delta by leave-one-out check-loss cross-validation. With the selected bandwidths and \delta, nplsqreg then fits a kernel regression of Y_i^\delta on X_i using the ordinary mixed-data machinery in npreg. The fitted mean of the transformed response is the estimated conditional quantile.

The scale pilot is interpreted as a conditional standard deviation. The default pilot estimates the conditional mean, smooths squared residuals, floors the fitted variance before taking square roots, and then uses the resulting positive scale vector in the quantile-kernel transformation. The local-linear residual-variance pilot follows the idea of Fan and Yao (1998); regtype.pilot can be used to select the pilot regression type independently of the final quantile fit.

If tau has length greater than one, tau.search="full" performs a separate bandwidth/\delta search for each quantile while sharing the same pilot scale. The explicit tau.search="refined" route fits the central quantile first and warm-starts the remaining quantiles, recording the search order and warm-start provenance in the returned object. The conservative default is tau.search="full".

Gradients and categorical effects are those returned by the final npreg fit on the transformed response. Thus ordered-factor effects are finite-difference contrasts and unordered-factor effects follow the corresponding mixed-data regression semantics.

Value

nplsqreg returns an object of class lsqregression. The generic functions fitted, quantile, se, predict, residuals, and gradients extract estimated conditional quantiles, asymptotic standard errors from the transformed-response regression, predictions, residuals when requested, and gradients or categorical effects. The functions summary, print, and plot support objects of this type.

Usage Issues

If you are using data of mixed types, then it is advisable to use the data.frame function to construct your input data and not cbind, since cbind will typically not work as intended on mixed data types and will coerce the data to the same type.

Author(s)

Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca

References

Fan, J. and Q. Yao (1998), “Efficient Estimation of Conditional Variance Functions in Stochastic Regression,” Biometrika, 85, 645-660. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1093/biomet/85.3.645")}

Racine, J.S. and K. Li (2017), “Nonparametric conditional quantile estimation: A locally weighted quantile kernel approach,” Journal of Econometrics, 201, 72-94. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jeconom.2017.06.020")}

Racine, J.S. and I. Van Keilegom (2020), “A smooth nonparametric, multivariate, mixed-data location-scale test,” Journal of Business & Economic Statistics, 38, 784-795. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/07350015.2019.1574227")}

See Also

nplsqregbw, npreg, npqreg, np.kernels, np.options, plot.np

Examples

## Not run: 
data("Italy")

model.q <- nplsqreg(gdp ~ ordered(year), data = Italy,
                    tau = c(0.25, 0.50, 0.75))
plot(model.q)

model.med <- nplsqreg(gdp ~ ordered(year), data = Italy, tau = 0.50)
model.q2 <- nplsqreg(bws = model.med$bws, tau = 0.50)
plot(model.med, gradient = TRUE)

## End(Not run)

np documentation built on June 26, 2026, 9:06 a.m.