fhpd: MLE Fitting of Hybrid Pareto Extreme Value Mixture Model

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/fhpd.r

Description

Maximum likelihood estimation for fitting the hybrid Pareto extreme value mixture model

Usage

1
2
3
4
5
6
fhpd(x, pvector = NULL, std.err = TRUE, method = "BFGS",
  control = list(maxit = 10000), finitelik = TRUE, ...)

lhpd(x, nmean = 0, nsd = 1, xi = 0, log = TRUE)

nlhpd(pvector, x, finitelik = FALSE)

Arguments

x

vector of sample data

pvector

vector of initial values of parameters (nmean, nsd, xi) or NULL

std.err

logical, should standard errors be calculated

method

optimisation method (see optim)

control

optimisation control list (see optim)

finitelik

logical, should log-likelihood return finite value for invalid parameters

...

optional inputs passed to optim

nmean

scalar normal mean

nsd

scalar normal standard deviation (positive)

xi

scalar shape parameter

log

logical, if TRUE then log-likelihood rather than likelihood is output

Details

The hybrid Pareto model is fitted to the entire dataset using maximum likelihood estimation. The estimated parameters, variance-covariance matrix and their standard errors are automatically output.

The log-likelihood and negative log-likelihood are also provided for wider usage, e.g. constructing profile likelihood functions. The parameter vector pvector must be specified in the negative log-likelihood nlhpd.

Log-likelihood calculations are carried out in lhpd, which takes parameters as inputs in the same form as distribution functions. The negative log-likelihood is a wrapper for lhpd, designed towards making it useable for optimisation (e.g. parameters are given a vector as first input).

Missing values (NA and NaN) are assumed to be invalid data so are ignored, which is inconsistent with the evd library which assumes the missing values are below the threshold.

The function lhpd carries out the calculations for the log-likelihood directly, which can be exponentiated to give actual likelihood using (log=FALSE).

The default optimisation algorithm is "BFGS", which requires a finite negative log-likelihood function evaluation finitelik=TRUE. For invalid parameters, a zero likelihood is replaced with exp(-1e6). The "BFGS" optimisation algorithms require finite values for likelihood, so any user input for finitelik will be overridden and set to finitelik=TRUE if either of these optimisation methods is chosen.

It will display a warning for non-zero convergence result comes from optim function call.

If the hessian is of reduced rank then the variance covariance (from inverse hessian) and standard error of parameters cannot be calculated, then by default std.err=TRUE and the function will stop. If you want the parameter estimates even if the hessian is of reduced rank (e.g. in a simulation study) then set std.err=FALSE.

Value

lhpd gives (log-)likelihood and nlhpd gives the negative log-likelihood. fhpd returns a simple list with the following elements

call: optim call
x: data vector x
init: pvector
optim: complete optim output
mle: vector of MLE of parameters
cov: variance-covariance matrix of MLE of parameters
se: vector of standard errors of MLE of parameters
rate: phiu to be consistent with evd
nllh: minimum negative log-likelihood
n: total sample size
nmean: MLE of normal mean
nsd: MLE of normal standard deviation
u: threshold (implicit from other parameters)
sigmau: MLE of GPD scale
xi: MLE of GPD shape
phiu: MLE of tail fraction (implied by 1/(1+pnorm(u,nmean,nsd)))

The output list has some duplicate entries and repeats some of the inputs to both provide similar items to those from fpot and to make it as useable as possible.

Note

Unlike most of the distribution functions for the extreme value mixture models, the MLE fitting only permits single scalar values for each parameter. Only the data is a vector.

When pvector=NULL then the initial values are calculated, type fhpd to see the default formulae used. The mixture model fitting can be ***extremely*** sensitive to the initial values, so you if you get a poor fit then try some alternatives. Avoid setting the starting value for the shape parameter to xi=0 as depending on the optimisation method it may be get stuck.

A default value for the tail fraction phiu=TRUE is given. The lhpd also has the usual defaults for the other parameters, but nlhpd has no defaults.

Invalid parameter ranges will give 0 for likelihood, log(0)=-Inf for log-likelihood and -log(0)=Inf for negative log-likelihood.

Infinite and missing sample values are dropped.

Error checking of the inputs is carried out and will either stop or give warning message as appropriate.

Author(s)

Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz

References

http://en.wikipedia.org/wiki/Normal_distribution

http://en.wikipedia.org/wiki/Generalized_Pareto_distribution

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Carreau, J. and Y. Bengio (2008). A hybrid Pareto model for asymmetric fat-tailed data: the univariate case. Extremes 12 (1), 53-76.

See Also

fgpd and gpd

The condmixt package written by one of the original authors of the hybrid Pareto model (Carreau and Bengio, 2008) also has similar functions for the likelihood of the hybrid Pareto (hpareto.negloglike) and fitting (hpareto.fit).

Other hpd: fhpdcon, hpdcon, hpd

Other hpdcon: fhpdcon, hpdcon, hpd

Other normgpd: fgng, fitmnormgpd, flognormgpd, fnormgpdcon, fnormgpd, gngcon, gng, hpdcon, hpd, itmnormgpd, lognormgpdcon, lognormgpd, normgpdcon, normgpd

Other fhpd: hpd

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Not run: 
set.seed(1)
par(mfrow = c(1, 1))

x = rnorm(1000)
xx = seq(-4, 4, 0.01)
y = dnorm(xx)

# Hybrid Pareto provides reasonable fit for some asymmetric heavy upper tailed distributions
# but not for cases such as the normal distribution
fit = fhpd(x, std.err = FALSE)
hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 4))
lines(xx, y)
with(fit, lines(xx, dhpd(xx, nmean, nsd, xi), col="red"))
abline(v = fit$u)

# Notice that if tail fraction is included a better fit is obtained
fit2 = fnormgpdcon(x, std.err = FALSE)
with(fit2, lines(xx, dnormgpdcon(xx, nmean, nsd, u, xi), col="blue"))
abline(v = fit2$u)
legend("topright", c("Standard Normal", "Hybrid Pareto", "Normal+GPD Continuous"),
  col=c("black", "red", "blue"), lty = 1)

## End(Not run)
 

evmix documentation built on Sept. 3, 2019, 5:07 p.m.