fgpd: MLE Fitting of Generalised Pareto Distribution (GPD)
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples

Maximum likelihood estimation for fitting the GPD with parameters scale sigmau and shape xi to the threshold exceedances, conditional on being above a threshold u. Unconditional likelihood fitting also provided when the probability phiu of being above the threshold u is given.

fgpd(x, u = 0, phiu = NULL, pvector = NULL, std.err = TRUE,
  method = "BFGS", control = list(maxit = 10000), finitelik = TRUE,
  ...)

lgpd(x, u = 0, sigmau = 1, xi = 0, phiu = 1, log = TRUE)

nlgpd(pvector, x, u = 0, phiu = 1, finitelik = FALSE)

`x`	vector of sample data
`u`	scalar threshold
`phiu`	probability of being above threshold [0, 1] or `NULL`, see Details
`pvector`	vector of initial values of GPD parameters (`sigmau`, `xi`) or `NULL`
`std.err`	logical, should standard errors be calculated
`method`	optimisation method (see `optim`)
`control`	optimisation control list (see `optim`)
`finitelik`	logical, should log-likelihood return finite value for invalid parameters
`...`	optional inputs passed to `optim`
`sigmau`	scalar scale parameter (positive)
`xi`	scalar shape parameter
`log`	logical, if `TRUE` then log-likelihood rather than likelihood is output

The GPD is fitted to the exceedances of the threshold u using maximum likelihood estimation. The estimated parameters, variance-covariance matrix and their standard errors are automatically output.

The log-likelihood and negative log-likelihood are also provided for wider usage, e.g. constructing your own extreme value mixture model or profile likelihood functions. The parameter vector pvector must be specified in the negative log-likelihood nlgpd.

Log-likelihood calculations are carried out in lgpd, which takes parameters as inputs in the same form as distribution functions. The negative log-likelihood is a wrapper for lgpd, designed towards making it useable for optimisation (e.g. parameters are given a vector as first input).

The default value for the tail fraction phiu in the fitting function fgpd is NULL, in which case the MLE is calculated using the sample proportion of exceedances. In this case the standard error for phiu is estimated and output as se.phiu, otherwise it is set to NA. Consistent with the evd library the missing values (NA and NaN) are assumed to be below the threshold in calculating the tail fraction.

Otherwise, in the fitting function fgpd the tail fraction phiu can be specified as any value over (0, 1], i.e. excludes φ_u=0, leading to the unconditional log-likelihood being used for estimation. In this case the standard error will be output as NA.

In the log-likelihood functions lgpd and nlgpd the tail fraction phiu cannot be NULL but can be over the range [0, 1], i.e. which includes φ_u=0.

The value of phiu does not effect the GPD parameter estimates, only the value of the likelihood, as:

L(σ_u, ξ; u, φ_u) = (φ_u ^ {n_u}) L(σ_u, ξ; u, φ_u=1)

where the GPD has scale σ_u and shape ξ, the threshold is u and nu is the number of exceedances. A non-unit value for phiu simply scales the likelihood and shifts the log-likelihood, thus the GPD parameter estimates are invariant to phiu.

The default optimisation algorithm is "BFGS", which requires a finite negative log-likelihood function evaluation finitelik=TRUE. For invalid parameters, a zero likelihood is replaced with exp(-1e6). The "BFGS" optimisation algorithms require finite values for likelihood, so any user input for finitelik will be overridden and set to finitelik=TRUE if either of these optimisation methods is chosen.

It will display a warning for non-zero convergence result comes from optim function call.

If the hessian is of reduced rank then the variance covariance (from inverse hessian) and standard error of parameters cannot be calculated, then by default std.err=TRUE and the function will stop. If you want the parameter estimates even if the hessian is of reduced rank (e.g. in a simulation study) then set std.err=FALSE.

lgpd gives (log-)likelihood and nlgpd gives the negative log-likelihood. fgpd returns a simple list with the following elements

`call`:	`optim` call
`x`:	data vector `x`
`init`:	`pvector`
`optim`:	complete `optim` output
`mle`:	vector of MLE of parameters
`cov`:	variance-covariance matrix of MLE of parameters
`se`:	vector of standard errors of MLE of parameters
`rate`:	`phiu` to be consistent with `evd`
`nllh`:	minimum negative log-likelihood
`n`:	total sample size
`u`:	threshold
`sigmau`:	MLE of GPD scale
`xi`:	MLE of GPD shape
`phiu`:	MLE of tail fraction
`se.phiu`:	standard error of MLE of tail fraction (parameterised approach using sample proportion)

The output list has some duplicate entries and repeats some of the inputs to both provide similar items to those from fpot and increase usability.

Based on the gpd.fit and fpot functions in the ismev and evd packages for which their author's contributions are gratefully acknowledged. They are designed to have similar syntax and functionality to simplify the transition for users of these packages.

Unlike all the distribution functions for the GPD, the MLE fitting only permits single scalar values for each parameter, phiu and threshold u.

When pvector=NULL then the initial values are calculated, type fgpd to see the default formulae used. The GPD fitting is not very sensitive to the initial values, so you will rarely have to give alternatives. Avoid setting the starting value for the shape parameter to xi=0 as depending on the optimisation method it may be get stuck.

Default values for the threshold u=0 and tail fraction phiu=NULL are given in the fitting fpgd, in which case the MLE assumes that excesses over the threshold are given, rather than exceedances.

The usual default of phiu=1 is given in the likelihood functions lpgd and nlpgd.

The lgpd also has the usual defaults for the other parameters, but nlgpd has no defaults.

Infinite sample values are dropped in fitting function fpgd, but missing values are used to estimate phiu as described above. But in likelihood functions lpgd and nlpgd both infinite and missing values are ignored.

Error checking of the inputs is carried out and will either stop or give warning message as appropriate.

Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84(5), 1-27. doi: 10.18637/jss.v084.i05.

dgpd, fpot and fitdistr

Other gpd: gpd

Other fgpd: gpd

set.seed(1)
par(mfrow = c(2, 1))

# GPD is conditional model for threshold exceedances
# so tail fraction phiu not relevant when only have exceedances
x = rgpd(1000, u = 10, sigmau = 5, xi = 0.2)
xx = seq(0, 100, 0.1)
hist(x, breaks = 100, freq = FALSE, xlim = c(0, 100))
lines(xx, dgpd(xx, u = 10, sigmau = 5, xi = 0.2))
fit = fgpd(x, u = 10)
lines(xx, dgpd(xx, u = fit$u, sigmau = fit$sigmau, xi = fit$xi), col="red")

# but tail fraction phiu is needed for conditional modelling of population tail
x = rnorm(10000)
xx = seq(-4, 4, 0.01)
hist(x, breaks = 200, freq = FALSE, xlim = c(0, 4))
lines(xx, dnorm(xx), lwd = 2)
fit = fgpd(x, u = 1)
lines(xx, dgpd(xx, u = fit$u, sigmau = fit$sigmau, xi = fit$xi, phiu = fit$phiu),
  col = "red", lwd = 2)
legend("topright", c("True Density","Fitted Density"), col=c("black", "red"), lty = 1)