Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples
Maximum likelihood estimation for fitting the extreme value mixture model with mixture of gammas for bulk distribution upto the threshold and conditional GPD above threshold. With options for profile likelihood estimation for threshold and fixed threshold approach.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | fmgammagpd(x, M, phiu = TRUE, useq = NULL, fixedu = FALSE,
pvector = NULL, std.err = TRUE, method = "BFGS",
control = list(maxit = 10000), finitelik = TRUE, ...)
lmgammagpd(x, mgshape, mgscale, mgweight, u, sigmau, xi, phiu = TRUE,
log = TRUE)
nlmgammagpd(pvector, x, M, phiu = TRUE, finitelik = FALSE)
nlumgammagpd(pvector, u, x, M, phiu = TRUE, finitelik = FALSE)
nlEMmgammagpd(pvector, tau, mgweight, x, M, phiu = TRUE,
finitelik = FALSE)
proflumgammagpd(u, pvector, x, M, phiu = TRUE, method = "BFGS",
control = list(maxit = 10000), finitelik = TRUE, ...)
nluEMmgammagpd(pvector, u, tau, mgweight, x, M, phiu = TRUE,
finitelik = FALSE)
|
x |
vector of sample data |
M |
number of gamma components in mixture |
phiu |
probability of being above threshold (0, 1) or logical, see Details in
help for |
useq |
vector of thresholds (or scalar) to be considered in profile likelihood or
|
fixedu |
logical, should threshold be fixed (at either scalar value in |
pvector |
vector of initial values of parameters or |
std.err |
logical, should standard errors be calculated |
method |
optimisation method (see |
control |
optimisation control list (see |
finitelik |
logical, should log-likelihood return finite value for invalid parameters |
... |
optional inputs passed to |
mgshape |
mgamma shape (positive) as vector of length |
mgscale |
mgamma scale (positive) as vector of length |
mgweight |
mgamma weights (positive) as vector of length |
u |
scalar threshold value |
sigmau |
scalar scale parameter (positive) |
xi |
scalar shape parameter |
log |
logical, if |
tau |
matrix of posterior probability of being in each component
( |
The extreme value mixture model with weighted mixture of gammas bulk and GPD tail is fitted to the entire dataset using maximum likelihood estimation using the EM algorithm. The estimated parameters, variance-covariance matrix and their standard errors are automatically output.
See help for fnormgpd
for details, type help fnormgpd
.
Only the different features are outlined below for brevity.
The expectation step estimates the expected probability of being in each component conditional on gamma component parameters. The maximisation step optimizes the negative log-likelihood conditional on posterior probabilities of each observation being in each component.
The optimisation of the likelihood for these mixture models can be very sensitive to
the initial parameter vector, as often there are numerous local modes. This is an
inherent feature of such models and the EM algorithm. The EM algorithm is guaranteed
to reach the maximum of the local mode. Multiple initial values should be considered
to find the global maximum. If the pvector
is input as NULL
then
random component probabilities are simulated as the initial values, so multiple such runs
should be run to check the sensitivity to initial values. Alternatives to black-box
likelihood optimisers (e.g. simulated annealing), or moving to computational Bayesian
inference, are also worth considering.
The log-likelihood functions are provided for wider usage, e.g. constructing profile
likelihood functions. The parameter vector pvector
must be specified in the
negative log-likelihood functions nlmgammagpd
and
nlEMmgammagpd
.
Log-likelihood calculations are carried out in lmgammagpd
,
which takes parameters as inputs in the same form as the distribution functions. The
negative log-likelihood function nlmgammagpd
is a wrapper
for lmgammagpd
designed towards making it useable for optimisation,
i.e. nlmgammagpd
has complete parameter vector as first input.
Though it is not directly used for optimisation here, as the EM algorithm due to mixture of
gammas for the bulk component of this model
The EM algorithm for the mixture of gammas utilises the
negative log-likelihood function nlEMmgammagpd
which takes the posterior probabilities tau and component probabilities
mgweight
as secondary inputs.
The profile likelihood for the threshold proflumgammagpd
also implements the EM algorithm for the mixture of gammas, utilising the negative
log-likelihood function nluEMmgammagpd
which takes
the threshold, posterior probabilities tau and component probabilities
mgweight
as secondary inputs.
Missing values (NA
and NaN
) are assumed to be invalid data so are ignored.
Suppose there are M gamma components with (scalar) shape and scale parameters and weight for each component. Only M-1 are to be provided in the initial parameter vector, as the Mth components weight is uniquely determined from the others.
The initial parameter vector pvector
always has the M gamma component
shape parameters followed by the corresponding M gamma scale parameters. However,
subsets of the other parameters are needed depending on which function is being used:
fmgammagpd - c(mgshape, mgscale, mgweight[1:(M-1)], u, sigmau, xi)
nlmgammagpd - c(mgshape, mgscale, mgweight[1:(M-1)], u, sigmau, xi)
nlumgammagpd and proflumgammagpd - c(mgshape, mgscale, mgweight[1:(M-1)], sigmau, xi)
nlEMmgammagpd - c(mgshape, mgscale, u, sigmau, xi)
nluEMmgammagpd - c(mgshape, mgscale, sigmau, xi)
Notice that when the component probability weights are included only the first M-1 are specified, as the remaining one can be uniquely determined from these. Where some parameters are left out, they are always taken as secondary inputs to the functions.
For identifiability purposes the mean of each gamma component must be in ascending in order. If the initial parameter vector does not satisfy this constraint then an error is given.
Non-positive data are ignored as likelihood is infinite, except for gshape=1
.
Log-likelihood is given by lmgammagpd
and it's
wrappers for negative log-likelihood from nlmgammagpd
and nlumgammagpd
. The conditional negative log-likelihoods
using the posterior probabilities are nlEMmgammagpd
and nluEMmgammagpd
. Profile likelihood for single
threshold given by proflumgammagpd
using EM algorithm. Fitting function
fmgammagpd
using EM algorithm returns a simple list with the
following elements
call : | optim call |
x : | data vector x |
init : | pvector |
fixedu : | fixed threshold, logical |
useq : | threshold vector for profile likelihood or scalar for fixed threshold |
nllhuseq : | profile negative log-likelihood at each threshold in useq |
optim : | complete optim output |
mle : | vector of MLE of parameters |
cov : | variance-covariance matrix of MLE of parameters |
se : | vector of standard errors of MLE of parameters |
rate : | phiu to be consistent with evd |
nllh : | minimum negative log-likelihood |
n : | total sample size |
M : | number of gamma components |
mgshape : | MLE of gamma shapes |
mgscale : | MLE of gamma scales |
mgweight : | MLE of gamma weights |
u : | threshold (fixed or MLE) |
sigmau : | MLE of GPD scale |
xi : | MLE of GPD shape |
phiu : | MLE of tail fraction (bulk model or parameterised approach) |
se.phiu : | standard error of MLE of tail fraction |
EMresults : | EM results giving complete negative log-likelihood, estimated parameters and conditional "maximisation step" negative log-likelihood and convergence result |
posterior : | posterior probabilites |
Thanks to Daniela Laas, University of St Gallen, Switzerland for reporting various bugs in these functions.
See Acknowledgments in
fnormgpd
, type help fnormgpd
.
In the fitting and profile likelihood functions, when pvector=NULL
then the
default initial values are obtained under the following scheme:
number of sample from each component is simulated from symmetric multinomial distribution;
sample data is then sorted and split into groups of this size (works well when components have modes which are well separated);
for data within each component approximate MLE's for the gamma shape and scale parameters are estimated;
threshold is specified as sample 90% quantile; and
MLE of GPD parameters above threshold.
The other likelihood functions lmgammagpd
,
nlmgammagpd
, nlumgammagpd
and
nlEMmgammagpd
and nluEMmgammagpd
have no defaults.
Carl Scarrott carl.scarrott@canterbury.ac.nz
http://www.math.canterbury.ac.nz/~c.scarrott/evmix
http://en.wikipedia.org/wiki/Gamma_distribution
http://en.wikipedia.org/wiki/Mixture_model
http://en.wikipedia.org/wiki/Generalized_Pareto_distribution
McLachlan, G.J. and Peel, D. (2000). Finite Mixture Models. Wiley.
Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf
Hu, Y. (2013). Extreme value mixture modelling: An R package and simulation study. MSc (Hons) thesis, University of Canterbury, New Zealand. http://ir.canterbury.ac.nz/simple-search?query=extreme&submit=Go
do Nascimento, F.F., Gamerman, D. and Lopes, H.F. (2011). A semiparametric Bayesian approach to extreme value estimation. Statistical Computing, 22(2), 661-675.
Other gammagpd: fgammagpdcon
,
fgammagpd
, fmgamma
,
gammagpdcon
, gammagpd
,
mgammagpd
Other mgamma: fmgammagpdcon
,
fmgamma
, mgammagpdcon
,
mgammagpd
, mgamma
Other mgammagpd: fgammagpd
,
fmgammagpdcon
, fmgamma
,
gammagpd
, mgammagpdcon
,
mgammagpd
, mgamma
Other mgammagpdcon: fgammagpdcon
,
fmgammagpdcon
, fmgamma
,
gammagpdcon
, mgammagpdcon
,
mgammagpd
, mgamma
Other fmgammagpd: mgammagpd
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | ## Not run:
set.seed(1)
par(mfrow = c(2, 1))
n=1000
x = c(rgamma(n*0.25, shape = 1, scale = 1), rgamma(n*0.75, shape = 6, scale = 2))
xx = seq(-1, 40, 0.01)
y = (0.25*dgamma(xx, shape = 1, scale = 1) + 0.75 * dgamma(xx, shape = 6, scale = 2))
# Bulk model based tail fraction
# very sensitive to initial values, so best to provide sensible ones
fit.noinit = fmgammagpd(x, M = 2)
fit.withinit = fmgammagpd(x, M = 2, pvector = c(1, 6, 1, 2, 0.5, 15, 4, 0.1))
hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 40))
lines(xx, y)
with(fit.noinit, lines(xx, dmgammagpd(xx, mgshape, mgscale, mgweight, u, sigmau, xi),
col="red"))
abline(v = fit.noinit$u, col = "red")
with(fit.withinit, lines(xx, dmgammagpd(xx, mgshape, mgscale, mgweight, u, sigmau, xi),
col="green"))
abline(v = fit.withinit$u, col = "green")
# Parameterised tail fraction
fit2 = fmgammagpd(x, M = 2, phiu = FALSE, pvector = c(1, 6, 1, 2, 0.5, 15, 4, 0.1))
with(fit2, lines(xx, dmgammagpd(xx, mgshape, mgscale, mgweight, u, sigmau, xi, phiu), col="blue"))
abline(v = fit2$u, col = "blue")
legend("topright", c("True Density","Default pvector", "Sensible pvector",
"Parameterised Tail Fraction"), col=c("black", "red", "green", "blue"), lty = 1)
# Fixed threshold approach
fitfix = fmgammagpd(x, M = 2, useq = 15, fixedu = TRUE,
pvector = c(1, 6, 1, 2, 0.5, 4, 0.1))
hist(x, breaks = 100, freq = FALSE, xlim = c(-1, 40))
lines(xx, y)
with(fit.withinit, lines(xx, dmgammagpd(xx, mgshape, mgscale, mgweight, u, sigmau, xi), col="red"))
abline(v = fit.withinit$u, col = "red")
with(fitfix, lines(xx, dmgammagpd(xx,mgshape, mgscale, mgweight, u, sigmau, xi), col="darkgreen"))
abline(v = fitfix$u, col = "darkgreen")
legend("topright", c("True Density", "Default initial value (90% quantile)",
"Fixed threshold approach"), col=c("black", "red", "darkgreen"), lty = 1)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.