Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples
Maximum likelihood estimation for fitting the extreme value mixture model with normal for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.
1 2 3 4 5 6 7 8 9 10 11 12 13  fnormgpd(x, phiu = TRUE, useq = NULL, fixedu = FALSE,
pvector = NULL, std.err = TRUE, method = "BFGS",
control = list(maxit = 10000), finitelik = TRUE, ...)
lnormgpd(x, nmean = 0, nsd = 1, u = qnorm(0.9, nmean, nsd),
sigmau = nsd, xi = 0, phiu = TRUE, log = TRUE)
nlnormgpd(pvector, x, phiu = TRUE, finitelik = FALSE)
proflunormgpd(u, pvector = NULL, x, phiu = TRUE, method = "BFGS",
control = list(maxit = 10000), finitelik = TRUE, ...)
nlunormgpd(pvector, u, x, phiu = TRUE, finitelik = FALSE)

x 
vector of sample data 
phiu 
probability of being above threshold (0, 1) or logical, see Details in
help for 
useq 
vector of thresholds (or scalar) to be considered in profile likelihood or

fixedu 
logical, should threshold be fixed (at either scalar value in 
pvector 
vector of initial values of parameters or 
std.err 
logical, should standard errors be calculated 
method 
optimisation method (see 
control 
optimisation control list (see 
finitelik 
logical, should loglikelihood return finite value for invalid parameters 
... 
optional inputs passed to 
nmean 
scalar normal mean 
nsd 
scalar normal standard deviation (positive) 
u 
scalar threshold value 
sigmau 
scalar scale parameter (positive) 
xi 
scalar shape parameter 
log 
logical, if 
The extreme value mixture model with normal bulk and GPD tail is fitted to the entire dataset using maximum likelihood estimation. The estimated parameters, variancecovariance matrix and their standard errors are automatically output.
The optimisation of the likelihood for these mixture models can be very sensitive to
the initial parameter vector (particularly the threshold), as often there are numerous
local modes where multiple thresholds give similar fits. This is an inherent feature
of such models. Options are provided by the arguments pvector
,
useq
and fixedu
to implement various commonly used likelihood inference
approaches for such models:
(default) pvector=NULL
, useq=NULL
and fixedu=FALSE
 to set initial value for threshold at 90% quantile along with usual defaults for
other parameters as defined in Notes below. Standard likelihood optimisation is used;
pvector=c(nmean, nsd, u, sigmau, xi)
 where initial values of all
5 parameters are manually set. Standard likelihood optimisation is used;
useq
as vector  to specify a sequence of thresholds at which to evaluate
profile likelihood and extract threshold which gives maximum profile likelihood; or
useq
as scalar  to specify a single value for threshold to be considered.
In options (3) and (4) the threshold can be treated as:
initial value for maximum likelihood estimation when fixedu=FALSE
, using
either profile likelihood estimate (3) or prechosen threshold (4); or
a fixed threshold with MLE for other parameters when fixedu=TRUE
, using
either profile likelihood estimate (3) or prechosen threshold (4).
The latter approach can be used to implement the traditional fixed threshold modelling
approach with threshold prechosen using, for example, graphical diagnostics. Further,
in either such case (3) or (4) the pvector
could be:
NULL
for usual defaults for other four parameters, defined in Notes below; or
vector of initial values for remaining 4 parameters
(nmean
, nsd
, sigmau
, xi
).
If the threshold is treated as fixed, then the likelihood is separable between the bulk and tail components. However, in practice we have found blackbox optimisation of the combined likelihood works sufficiently well, so is used herein.
The following functions are provided:
fnormgpd
 maximum likelihood fitting with all the above options;
lnormgpd
 loglikelihood;
nlnormgpd
 negative loglikelihood;
proflunormgpd
 profile likelihood for given threshold; and
nlunormgpd
 negative loglikelihood (threshold specified separately).
The loglikelihood functions are provided for wider usage, e.g. constructing profile likelihood functions.
Defaults values for the parameter vector pvector
are given in the fitting
fnormgpd
and profile likelihood functions
proflunormgpd
. The parameter vector pvector
must be specified in the negative loglikelihood functions
nlnormgpd
and nlunormgpd
.
The threshold u
must also be specified in the profile likelihood function
proflunormgpd
and nlunormgpd
.
Loglikelihood calculations are carried out in lnormgpd
,
which takes parameters as inputs in the same form as distribution functions. The negative
loglikelihood functions nlnormgpd
and
nlunormgpd
are wrappers for likelihood function
lnormgpd
designed towards optimisation,
i.e. nlnormgpd
has vector of all 5 parameters as
first input and nlunormgpd
has threshold as second input
and vector of remaining 4 parameters as first input. The profile likelihood
function proflunormgpd
has threshold u
as the first
input, to permit use of sapply
function to evaluate profile
likelihood over vector of potential thresholds.
The tail fraction phiu
is treated separately to the other parameters,
to allow for all it's representations. In the fitting
fnormgpd
and profile likelihood function
proflunormgpd
it is logical:
default value phiu=TRUE
 tail fraction specified by
normal survivor function phiu = 1  pnorm(u, nmean, nsd)
and standard error is
output as NA
; and
phiu=FALSE
 treated as extra parameter estimated using the MLE which is
the sample proportion above the threshold and standard error is output.
In the likelihood functions lnormgpd
,
nlnormgpd
and nlunormgpd
it can be logical or numeric:
logical  same as for fitting functions with default value phiu=TRUE
.
numeric  any value over range (0, 1). Notice that the tail fraction probability cannot be 0 or 1 otherwise there would be no contribution from either tail or bulk components respectively.
Missing values (NA
and NaN
) are assumed to be invalid data so are ignored,
which is inconsistent with the evd
library which assumes the
missing values are below the threshold.
The function lnormgpd
carries out the calculations
for the loglikelihood directly, which can be exponentiated to give actual
likelihood using (log=FALSE
).
The default optimisation algorithm is "BFGS", which requires a finite negative
loglikelihood function evaluation finitelik=TRUE
. For invalid
parameters, a zero likelihood is replaced with exp(1e6)
. The "BFGS"
optimisation algorithms require finite values for likelihood, so any user
input for finitelik
will be overridden and set to finitelik=TRUE
if either of these optimisation methods is chosen.
It will display a warning for nonzero convergence result comes from
optim
function call or for common indicators of lack
of convergence (e.g. any estimated parameters same as initial values).
If the hessian is of reduced rank then the variance covariance (from inverse hessian)
and standard error of parameters cannot be calculated, then by default
std.err=TRUE
and the function will stop. If you want the parameter estimates
even if the hessian is of reduced rank (e.g. in a simulation study) then
set std.err=FALSE
.
Loglikelihood is given by lnormgpd
and it's
wrappers for negative loglikelihood from nlnormgpd
and nlunormgpd
. Profile likelihood for single
threshold given by proflunormgpd
. Fitting function
fnormgpd
returns a simple list with the
following elements
call :  optim call 
x :  data vector x 
init :  pvector 
fixedu :  fixed threshold, logical 
useq :  threshold vector for profile likelihood or scalar for fixed threshold 
nllhuseq :  profile negative loglikelihood at each threshold in useq 
optim :  complete optim output 
mle :  vector of MLE of parameters 
cov :  variancecovariance matrix of MLE of parameters 
se :  vector of standard errors of MLE of parameters 
rate :  phiu to be consistent with evd 
nllh :  minimum negative loglikelihood 
n :  total sample size 
nmean :  MLE of normal mean 
nsd :  MLE of normal standard deviation 
u :  threshold (fixed or MLE) 
sigmau :  MLE of GPD scale 
xi :  MLE of GPD shape 
phiu :  MLE of tail fraction (bulk model or parameterised approach) 
se.phiu :  standard error of MLE of tail fraction 
The output list has some duplicate entries and repeats some of the inputs to both
provide similar items to those from fpot
and increase usability.
These functions are deliberately similar
in syntax and functionality to the commonly used functions in the
ismev
and evd
packages
for which their author's contributions are gratefully acknowledged.
Anna MacDonald and Xin Zhao laid some of the groundwork with programs they wrote for MATLAB.
Clement Lee and Emma Eastoe suggested providing inbuilt profile likelihood estimation for threshold and fixed threshold approach.
Unlike most of the distribution functions for the extreme value mixture models,
the MLE fitting only permits single scalar values for each parameter and
phiu
.
When pvector=NULL
then the initial values are:
MLE of normal parameters assuming entire population is normal; and
threshold 90% quantile (not relevant for profile likelihood or fixed threshold approaches);
MLE of GPD parameters above threshold.
Avoid setting the starting value for the shape parameter to
xi=0
as depending on the optimisation method it may be get stuck.
A default value for the tail fraction phiu=TRUE
is given.
The lnormgpd
also has the usual defaults for
the other parameters, but nlnormgpd
and
nlunormgpd
has no defaults.
If the hessian is of reduced rank then the variance covariance (from inverse hessian)
and standard error of parameters cannot be calculated, then by default
std.err=TRUE
and the function will stop. If you want the parameter estimates
even if the hessian is of reduced rank (e.g. in a simulation study) then
set std.err=FALSE
.
Invalid parameter ranges will give 0
for likelihood, log(0)=Inf
for
loglikelihood and log(0)=Inf
for negative loglikelihood.
Due to symmetry, the lower tail can be described by GPD by negating the data/quantiles.
Infinite and missing sample values are dropped.
Error checking of the inputs is carried out and will either stop or give warning message as appropriate.
Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz
http://www.math.canterbury.ac.nz/~c.scarrott/evmix
http://en.wikipedia.org/wiki/Normal_distribution
http://en.wikipedia.org/wiki/Generalized_Pareto_distribution
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT  Statistical Journal 10(1), 3359. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
Hu, Y. (2013). Extreme value mixture modelling: An R package and simulation study. MSc (Hons) thesis, University of Canterbury, New Zealand. http://ir.canterbury.ac.nz/simplesearch?query=extreme&submit=Go
Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of Statistical Software 84(5), 127. doi: 10.18637/jss.v084.i05.
Behrens, C.N., Lopes, H.F. and Gamerman, D. (2004). Bayesian analysis of extreme events with threshold estimation. Statistical Modelling. 4(3), 227244.
Other normgpd: fgng
, fhpd
,
fitmnormgpd
, flognormgpd
,
fnormgpdcon
, gngcon
,
gng
, hpdcon
,
hpd
, itmnormgpd
,
lognormgpdcon
, lognormgpd
,
normgpdcon
, normgpd
Other normgpdcon: fgngcon
,
fhpdcon
, flognormgpdcon
,
fnormgpdcon
, gngcon
,
gng
, hpdcon
,
hpd
, normgpdcon
,
normgpd
Other gng: fgngcon
, fgng
,
fitmgng
, gngcon
,
gng
, itmgng
,
normgpd
Other fnormgpd: normgpd
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40  ## Not run:
set.seed(1)
par(mfrow = c(2, 1))
x = rnorm(1000)
xx = seq(4, 4, 0.01)
y = dnorm(xx)
# Bulk model based tail fraction
fit = fnormgpd(x)
hist(x, breaks = 100, freq = FALSE, xlim = c(4, 4))
lines(xx, y)
with(fit, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="red"))
abline(v = fit$u, col = "red")
# Parameterised tail fraction
fit2 = fnormgpd(x, phiu = FALSE)
with(fit2, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi, phiu), col="blue"))
abline(v = fit2$u, col = "blue")
legend("topleft", c("True Density","Bulk Tail Fraction","Parameterised Tail Fraction"),
col=c("black", "red", "blue"), lty = 1)
# Profile likelihood for initial value of threshold and fixed threshold approach
fitu = fnormgpd(x, useq = seq(0, 3, length = 20))
fitfix = fnormgpd(x, useq = seq(0, 3, length = 20), fixedu = TRUE)
hist(x, breaks = 100, freq = FALSE, xlim = c(4, 4))
lines(xx, y)
with(fit, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="red"))
abline(v = fit$u, col = "red")
with(fitu, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="purple"))
abline(v = fitu$u, col = "purple")
with(fitfix, lines(xx, dnormgpd(xx, nmean, nsd, u, sigmau, xi), col="darkgreen"))
abline(v = fitfix$u, col = "darkgreen")
legend("topleft", c("True Density","Default initial value (90% quantile)",
"Prof. lik. for initial value", "Prof. lik. for fixed threshold"),
col=c("black", "red", "purple", "darkgreen"), lty = 1)
## End(Not run)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.