fit.gpd | R Documentation |
This function fits the Generalized Pareto distribution (GP) to the supplied data, which have to be threshold exceedances with the corresponding threshold already subtracted. The determination of the starting point for the optimization and the calculation of the return level and the all the corresponding estimates of the fitting errors will be done internally.
fit.gpd(x, initial = NULL, threshold = NULL, likelihood.function = likelihood, gradient.function = likelihood.gradient, error.estimation = c("MLE", "MC", "bootstrap", "none"), monte.carlo.sample.size = 100, bootstrap.sample.size = 100, return.period = 100, total.length = NULL, extreme.type = c("min", "max"), thresholding = FALSE, decluster = NULL, cluster.distance = NULL, silent = FALSE, debug = FALSE, mc.cores = NULL, ...)
x |
Either an object of class xts or a list of
those. The time series can be provided in two different formats:
as a series of threshold exceedances (possibly calculated using
the |
initial |
Initial values for the GP parameters. Has to be
provided as 2x1 vector. If NULL the parameters are estimated
with the function |
threshold |
Optional threshold used to extract the exceedances from the provided series x. If present it will be added to the return level to produce a value which fits to underlying time series. If the user wants the exceedances to be calculated within this function, this argument is a mandatory one. Default = NULL. |
likelihood.function |
Function which is going to be
optimized. Default: |
gradient.function |
If NULL a finite difference method is
invoked. To use the derived formula from the GP likelihood
gradient provide |
error.estimation |
Method for calculating the standard errors of the fitted results. The errors of the GP parameters will be calculated as the square roots of the diagonal elements of the inverse of the hessian matrix. The latter will be evaluated at the maximum likelihood estimates (MLE) of the GP parameters. MLE: The standard error of the return level is calculated using the Delta method and the maximum likelihood estimates of the GP parameters. Note: For positive shape parameters bigger than 0.3 this approach tends to highly overestimates the errors of the return levels. MC: Alternative one can use a Monte Carlo method for which monte.carlo.sample.size samples of the same size as x will be drawn from a GP distribution constituted by the obtained MLE of the GP parameters of x. The standard error is then calculated via the square of the variance of all fitted GP parameters and calculated return levels. Note: In its essence this approach is not an estimation of the error involved in fitting the time series to a GP distribution. It is rather the mean error of fitting a GP distribution with the same length and parameters as estimated ones. bootstrap: Using this option the provided time series x will be sampled with replacement bootstrap.sample.size times and with the same length as the original time series. The standard errors of the GP parameters and return levels of all those sampled series is calculated and returned as an estimate of the fitting error. Note: Since the data is (hopefully) GP-distributed, such a sampling has to be treated with a lot of care. Sometimes the inversion of the hessian fails (since the are some NaN in the hessian) when calculating the error estimates using the maximum likelihood approach (MLE) (which is also the reason why the ismev package occasionally does not work). In such cases the Monte Carlo (MC) method is used as a fallback. none skips the calculation of the error. Default = "MLE". |
monte.carlo.sample.size |
Number of samples used to obtain the Monte Carlo estimate of the standard error of the fitting. Default = 100. |
bootstrap.sample.size |
Number of samples with replacements to drawn from the original series x in order to determine the standard errors for the GP parameters and return levels. Default = 100. |
return.period |
Quantiles at which the return level is going to be evaluated. Class numeric. Default = 100. |
total.length |
Uses the maximum likelihood estimator to calculate the probability of a measurement to be an exceedance. Else an estimate based on the mean number of exceedances in the available years (time stamps of the class xts time series) will be used. Default = NULL. |
extreme.type |
String specifying whether the exceedances over a very high ("max") or low ("min") threshold should be fitted. If the low one is chosen, the input x has to be either a time series of all points below the threshold with the threshold subtracted or the raw series and fit.gpd function will handle the extraction of the minima internally. For the minima the resulting extremes are multiplied by -1 and fitted using a default GP distribution. The resulting scale and shape parameter are handed back without any additional changes and the return levels are multiplied by -1. Default = "max". |
thresholding |
Logical value. If TRUE, the numerical value
provided with threshold will be used in the
|
decluster |
Logical flag indicating whether or not to decluster the obtained exceedances over the threshold. Default = TRUE. |
cluster.distance |
Numerical value specifying how many points have to be below the threshold for the next point to be considered the starting point of a new cluster. Only supply a value when you really know what you are doing! Default = NULL. |
silent |
Determines whether or not warning messages shall be displayed and results shall be reported. Default = FALSE. |
debug |
If TRUE the routine will display debugging information featuring intermediate steps of the constrained optimization. Default = FALSE. |
mc.cores |
A numerical input specifying the number of cores
to use for the multi core application of the function (see
|
... |
Additional arguments for the |
The optimization is performed by the augmented Lagrangian
method using the auglag
function of the
alabama package. Within this framework the log-likelihood
function of the GP gets augmented with N+2 constraints, where N
is the number of points in the time series. N+1 of those
constraints ensure the log-likelihood (containing two
logarithms) to be always defined. The remaining constraints
ensures for the shape parameter to be always bigger than -1 for
the maximum likelihood to be defined in the first place. The
penalty in the log-likelihood function is the sum of all squared
constrain violations plus an additional term linear in the
constraint violation to ensure well-conditioning. Using this
penalty term the problem becomes unconstrained again and can be
solved using optim
. After each of those
inner routines the weighting parameter of the penalty is being
increased until some convergence conditions are fulfilled.
Since it usually takes just four to five outer iterations this
functions needs only double the time a pure call to the
optim
function would need.
The total.length argument refers to the length of the original time series before the thresholding was applied. If present it will be used to calculate the maximum likelihood estimate of the probability of an observation to be a threshold exceedance (necessary to determine the estimation errors for the calculated return levels). Else an estimator based on mean number of exceedances per year will be used.
If the user instead wants to fit just the exponential distribution and not the entire GP distribution, the shape parameter of the initial has to be set to 0. But in practice this is strongly discouraged since it will yield inferior results.
I found the Nelder-Mead method to be more robust to starting points more far away from the global optimum. This also holds for the inner routine of the augmented Lagrangian method. Since other routines, like CG and BFGS only cause problems in the extreme value analysis, there won't be an option to choose them in this package.
This function can also be applied to a list of xts class objects.
Output of the optim function with class c( "list",
"climex.fit.gpd" )
par = MLE of the GP parameters
value = Value of the negative log-likelihood evaluated at the MLE
counts = Number of evaluations of the likelihood function and its gradient during optimization (inner routine)
outer.iteration = Number of updates of the penalty and the Lagrangian parameter to fine-tune the impact of the constraints on the optimization (outer routine)
return.level = Estimate of the return levels at the provided return periods
se = Standard error of the GP parameters and the return levels
x = Threshold exceedances
threshold = Value which had to be exceeded
control = Parameter and options used during optimization
If, on the other hand, a list of xts class object was provided, a list of objects structured as describe above is returned.
Philipp Mueller
Other optimization: fit.gev.default
,
fit.gev.list
, fit.gev.xts
,
fit.gev
, fit.gpd.default
,
fit.gpd.list
, fit.gpd.xts
,
likelihood.augmented
,
likelihood.gradient.augmented
,
likelihood.gradient
,
likelihood.initials
,
likelihood
potsdam.anomalies <- anomalies( temp.potsdam ) potsdam.extremes <- threshold( potsdam.anomalies, threshold = 10, decluster = TRUE ) fit.gpd( potsdam.extremes )
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.