np.quantile: Kernel Univariate Quantile Estimation

npquantileR Documentation

Kernel Univariate Quantile Estimation

Description

npquantile computes smooth quantiles from a univariate unconditional kernel cumulative distribution estimate given data and, optionally, a bandwidth specification i.e. a dbandwidth object using the bandwidth selection method of Li, Li and Racine (2017).

Usage

npquantile(x = NULL,
           tau = c(0.01,0.05,0.25,0.50,0.75,0.95,0.99),
           num.eval = 10000,
           bws = NULL,
           f = 1,
           ...)

Arguments

x

a univariate vector of type numeric containing sample realizations (training data) used to estimate the cumulative distribution (must be the same training data used to compute the bandwidth object bws passed in).

tau

an optional vector containing the probabilities for quantile(s) to be estimated (must contain numbers in [0,1]). Defaults to c(0.01,0.05,0.25,0.50,0.75,0.95,0.99).

num.eval

an optional integer specifying the length of the grid on which the quasi-inverse is computed. Defaults to 10000.

bws

an optional dbandwidth specification (if already computed avoid unnecessary computation inside npquantile). This must be set as a dbandwidth object returned from an invocation of npudistbw. If not provided npudistbw is invoked with optional arguments passed via ....

f

an optional argument fed to extendrange. Defaults to 1. See ?extendrange for details.

...

additional arguments supplied to specify the bandwidth type, kernel types, bandwidth selection methods, and so on. See ?npudistbw for details.

Details

Typical usage is

    x <- rchisq(100,df=10)
    npquantile(x)
  

The quantile function q_τ is defined to be the left-continuous inverse of the distribution function F(x), i.e. q_τ = \inf\{x: F(x) ≥ τ\}.

A traditional estimator of q_τ is the τth sample quantile. However, these estimates suffer from lack of efficiency arising from variability of individual order statistics; see Sheather and Marron (1990) and Hyndman and Fan (1996) for methods that interpolate/smooth the order statistics, each of which discussed in the latter can be invoked through quantile via type=j, j=1,...,9.

The function npquantile implements a method for estimating smooth quantiles based on the quasi-inverse of a npudist object where F(x) is replaced with its kernel estimator and bandwidth selection is that appropriate for such objects; see Definition 2.3.6, page 21, Nelsen 2006 for a definition of the quasi-inverse of F(x).

For construction of the quasi-inverse we create a grid of evaluation points based on the function extendrange along with the sample quantiles themselves computed from invocation of quantile. The coarseness of the grid defined by extendrange (which has been passed the option f=1) is controlled by num.eval.

Note that for any value of τ less/greater than the smallest/largest value of F(x) computed for the evaluation data (i.e. that outlined in the paragraph above), the quantile returned for such values is that associated with the smallest/largest value of F(x), respectively.

Value

npquantile returns a vector of quantiles corresponding to tau.

Usage Issues

Cross-validated bandwidth selection is used by default (npudistbw). For large datasets this can be computationally demanding. In such cases one might instead consider a rule-of-thumb bandwidth (bwmethod="normal-reference") or, alternatively, use kd-trees (options(np.tree=TRUE) along with a bounded kernel (ckertype="epanechnikov")), both of which will reduce the computational burden appreciably.

Author(s)

Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca

References

Cheng, M.-Y. and Sun, S. (2006), “Bandwidth selection for kernel quantile estimation,” Journal of the Chinese Statistical Association, 44, 271-295.

Hyndman, R.J. and Fan, Y. (1996), “Sample quantiles in statistical packages,” American Statistician, 50, 361-365.

Li, Q. and J.S. Racine (2017), “Smooth Unconditional Quantile Estimation,” Manuscript.

Li, C. and H. Li and J.S. Racine (2017), “Cross-Validated Mixed Datatype Bandwidth Selection for Nonparametric Cumulative Distribution/Survivor Functions,” Econometric Reviews, 36, 970-987.

Nelsen, R.B. (2006), An Introduction to Copulas, Second Edition, Springer-Verlag.

Sheather, S. and J.S. Marron (1990), “Kernel quantile estimators,” Journal of the American Statistical Association, Vol. 85, No. 410, 410-416.

Yang, S.-S. (1985), “A Smooth Nonparametric Estimator of a Quantile Function,” Journal of the American Statistical Association, 80, 1004-1011.

See Also

quantile for various types of sample quantiles; ecdf for empirical distributions of which quantile is an inverse; boxplot.stats and fivenum for computing other versions of quartiles; qlogspline for logspline density quantiles; qkde for alternative kernel quantiles, etc.

Examples

## Not run: 
## Simulate data from a chi-square distribution
df <- 50
x <- rchisq(100,df=df)

## Vector of quantiles desired
tau <- c(0.01,0.05,0.25,0.50,0.75,0.95,0.99)

## Compute kernel smoothed sample quantiles
npquantile(x,tau)

## Compute sample quantiles using the default method in R (Type 7)
quantile(x,tau)

## True quantiles based on known distribution
qchisq(tau,df=df)

## End(Not run) 

np documentation built on Oct. 19, 2022, 1:08 a.m.