kdengpdcon: Kernel Density Estimate and GPD Tail Extreme Value Mixture...
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples

Density, cumulative distribution function, quantile function and random number generation for the extreme value mixture model with kernel density estimate for bulk distribution upto the threshold and conditional GPD above threshold with continuity at threshold. The parameters are the bandwidth lambda, threshold u GPD shape xi and tail fraction phiu.

dkdengpdcon(x, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), xi = 0, phiu = TRUE,
  bw = NULL, kernel = "gaussian", log = FALSE)

pkdengpdcon(q, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), xi = 0, phiu = TRUE,
  bw = NULL, kernel = "gaussian", lower.tail = TRUE)

qkdengpdcon(p, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), xi = 0, phiu = TRUE,
  bw = NULL, kernel = "gaussian", lower.tail = TRUE)

rkdengpdcon(n = 1, kerncentres, lambda = NULL,
  u = as.vector(quantile(kerncentres, 0.9)), xi = 0, phiu = TRUE,
  bw = NULL, kernel = "gaussian")

`x`	quantiles
`kerncentres`	kernel centres (typically sample data vector or scalar)
`lambda`	bandwidth for kernel (as half-width of kernel) or `NULL`
`u`	threshold
`xi`	shape parameter
`phiu`	probability of being above threshold [0, 1] or `TRUE`
`bw`	bandwidth for kernel (as standard deviations of kernel) or `NULL`
`kernel`	kernel name (`default = "gaussian"`)
`log`	logical, if TRUE then log density
`q`	quantiles
`lower.tail`	logical, if FALSE then upper tail probabilities
`p`	cumulative probabilities
`n`	sample size (positive integer)

Extreme value mixture model combining kernel density estimate (KDE) for the bulk below the threshold and GPD for upper tail with continuity at threshold.

The user can pre-specify phiu permitting a parameterised value for the tail fraction φ_u. Alternatively, when phiu=TRUE the tail fraction is estimated as the tail fraction from the KDE bulk model.

The alternate bandwidth definitions are discussed in the kernels, with the lambda as the default. The bw specification is the same as used in the density function.

The possible kernels are also defined in kernels with the "gaussian" as the default choice.

The cumulative distribution function with tail fraction φ_u defined by the upper tail fraction of the kernel density estimate (phiu=TRUE), upto the threshold x ≤ u, given by:

F(x) = H(x)

and above the threshold x > u:

F(x) = H(u) + [1 - H(u)] G(x)

where H(x) and G(X) are the KDE and conditional GPD cumulative distribution functions respectively.

The cumulative distribution function for pre-specified φ_u, upto the threshold x ≤ u, is given by:

F(x) = (1 - φ_u) H(x)/H(u)

and above the threshold x > u:

F(x) = φ_u + [1 - φ_u] G(x)

Notice that these definitions are equivalent when φ_u = 1 - H(u).

The continuity constraint means that (1 - φ_u) h(u)/H(u) = φ_u g(u) where h(x) and g(x) are the KDE and conditional GPD density functions respectively. The resulting GPD scale parameter is then:

σ_u = φ_u H(u) / [1 - φ_u] h(u)

. In the special case of where the tail fraction is defined by the bulk model this reduces to

σ_u = [1 - H(u)] / h(u)

If no bandwidth is provided lambda=NULL and bw=NULL then the normal reference rule is used, using the bw.nrd0 function, which is consistent with the density function. At least two kernel centres must be provided as the variance needs to be estimated.

See gpd for details of GPD upper tail component and dkden for details of KDE bulk component.

dkdengpdcon gives the density, pkdengpdcon gives the cumulative distribution function, qkdengpdcon gives the quantile function and rkdengpdcon gives a random sample.

Based on code by Anna MacDonald produced for MATLAB.

Unlike most of the other extreme value mixture model functions the kdengpdcon functions have not been vectorised as this is not appropriate. The main inputs (x, p or q) must be either a scalar or a vector, which also define the output length. The kerncentres can also be a scalar or vector.

The kernel centres kerncentres can either be a single datapoint or a vector of data. The kernel centres (kerncentres) and locations to evaluate density (x) and cumulative distribution function (q) would usually be different.

Default values are provided for all inputs, except for the fundamentals kerncentres, x, q and p. The default sample size for rkdengpdcon is 1.

Missing (NA) and Not-a-Number (NaN) values in x, p and q are passed through as is and infinite values are set to NA. None of these are not permitted for the parameters or kernel centres.

Due to symmetry, the lower tail can be described by GPD by negating the quantiles.

Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.

Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz.

http://en.wikipedia.org/wiki/Kernel_density_estimation

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Bowman, A.W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2), 353-360.

Duin, R.P.W. (1976). On the choice of smoothing parameters for Parzen estimators of probability density functions. IEEE Transactions on Computers C25(11), 1175-1179.

MacDonald, A., Scarrott, C.J., Lee, D., Darlow, B., Reale, M. and Russell, G. (2011). A flexible extreme value mixture model. Computational Statistics and Data Analysis 55(6), 2137-2157.

Wand, M. and Jones, M.C. (1995). Kernel Smoothing. Chapman && Hall.

kernels, kfun, density, bw.nrd0 and dkde in ks package.

Other kden: bckden, fbckden, fgkgcon, fgkg, fkdengpdcon, fkdengpd, fkden, kdengpd, kden

Other kdengpd: bckdengpd, fbckdengpd, fgkg, fkdengpdcon, fkdengpd, fkden, gkg, kdengpd, kden

Other kdengpdcon: bckdengpdcon, fbckdengpdcon, fgkgcon, fkdengpdcon, fkdengpd, gkgcon, kdengpd

Other gkgcon: fgkgcon, fgkg, fkdengpdcon, gkgcon, gkg

Other bckdengpdcon: bckdengpdcon, bckdengpd, bckden, fbckdengpdcon, fbckdengpd, fbckden, fkdengpdcon, gkgcon

Other fkdengpdcon: fkdengpdcon

## Not run: 
set.seed(1)
par(mfrow = c(2, 2))

kerncentres=rnorm(500, 0, 1)
xx = seq(-4, 4, 0.01)
hist(kerncentres, breaks = 100, freq = FALSE)
lines(xx, dkdengpdcon(xx, kerncentres, u = 1.2, xi = 0.1))

plot(xx, pkdengpdcon(xx, kerncentres), type = "l")
lines(xx, pkdengpdcon(xx, kerncentres, xi = 0.3), col = "red")
lines(xx, pkdengpdcon(xx, kerncentres, xi = -0.3), col = "blue")
legend("topleft", paste("xi =",c(0, 0.3, -0.3)),
      col=c("black", "red", "blue"), lty = 1, cex = 0.5)

x = rkdengpdcon(1000, kerncentres, phiu = 0.2, u = 1, xi = 0.2)
xx = seq(-4, 6, 0.01)
hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 6))
lines(xx, dkdengpdcon(xx, kerncentres, phiu = 0.2, u = 1, xi = -0.1))

plot(xx, dkdengpdcon(xx, kerncentres, xi=0, u = 1, phiu = 0.2), type = "l")
lines(xx, dkdengpdcon(xx, kerncentres, xi=0.2, u = 1, phiu = 0.2), col = "red")
lines(xx, dkdengpdcon(xx, kerncentres, xi=-0.2, u = 1, phiu = 0.2), col = "blue")
legend("topleft", c("xi = 0", "xi = 0.2", "xi = -0.2"),
      col=c("black", "red", "blue"), lty = 1)

## End(Not run)