fit.gpd: Improved maximum-likelihood fit of the generalized Pareto...
In theGreatWhiteShark/climex: Estimating Extreme Events in Climatic Time Series (using Data of the DWD)

fit.gpd

R Documentation

Improved maximum-likelihood fit of the generalized Pareto (GP) distribution

Description

This function fits the Generalized Pareto distribution (GP) to the supplied data, which have to be threshold exceedances with the corresponding threshold already subtracted. The determination of the starting point for the optimization and the calculation of the return level and the all the corresponding estimates of the fitting errors will be done internally.

Usage

fit.gpd(x, initial = NULL, threshold = NULL,
  likelihood.function = likelihood,
  gradient.function = likelihood.gradient, error.estimation = c("MLE",
  "MC", "bootstrap", "none"), monte.carlo.sample.size = 100,
  bootstrap.sample.size = 100, return.period = 100,
  total.length = NULL, extreme.type = c("min", "max"),
  thresholding = FALSE, decluster = NULL, cluster.distance = NULL,
  silent = FALSE, debug = FALSE, mc.cores = NULL, ...)

Arguments

`x`	Either an object of class xts or a list of those. The time series can be provided in two different formats: as a series of threshold exceedances (possibly calculated using the `threshold` function) and the thresholding argument set to FALSE or as the raw time series with thresholding set to TRUE. In the later case the exceedances will be extracted inside the fit.gpd function.
`initial`	Initial values for the GP parameters. Has to be provided as 2x1 vector. If NULL the parameters are estimated with the function `likelihood.initials`. If the shape parameter is set to 0 the exponential distribution instead of the GP one is fitted. But this its strongly discouraged to do so! Default = NULL
`threshold`	Optional threshold used to extract the exceedances from the provided series x. If present it will be added to the return level to produce a value which fits to underlying time series. If the user wants the exceedances to be calculated within this function, this argument is a mandatory one. Default = NULL.
`likelihood.function`	Function which is going to be optimized. Default: `likelihood`
`gradient.function`	If NULL a finite difference method is invoked. To use the derived formula from the GP likelihood gradient provide `likelihood.gradient`. Default = `likelihood.gradient`.
`error.estimation`	Method for calculating the standard errors of the fitted results. The errors of the GP parameters will be calculated as the square roots of the diagonal elements of the inverse of the hessian matrix. The latter will be evaluated at the maximum likelihood estimates (MLE) of the GP parameters. MLE: The standard error of the return level is calculated using the Delta method and the maximum likelihood estimates of the GP parameters. Note: For positive shape parameters bigger than 0.3 this approach tends to highly overestimates the errors of the return levels. MC: Alternative one can use a Monte Carlo method for which monte.carlo.sample.size samples of the same size as x will be drawn from a GP distribution constituted by the obtained MLE of the GP parameters of x. The standard error is then calculated via the square of the variance of all fitted GP parameters and calculated return levels. Note: In its essence this approach is not an estimation of the error involved in fitting the time series to a GP distribution. It is rather the mean error of fitting a GP distribution with the same length and parameters as estimated ones. bootstrap: Using this option the provided time series x will be sampled with replacement bootstrap.sample.size times and with the same length as the original time series. The standard errors of the GP parameters and return levels of all those sampled series is calculated and returned as an estimate of the fitting error. Note: Since the data is (hopefully) GP-distributed, such a sampling has to be treated with a lot of care. Sometimes the inversion of the hessian fails (since the are some NaN in the hessian) when calculating the error estimates using the maximum likelihood approach (MLE) (which is also the reason why the ismev package occasionally does not work). In such cases the Monte Carlo (MC) method is used as a fallback. none skips the calculation of the error. Default = "MLE".
`monte.carlo.sample.size`	Number of samples used to obtain the Monte Carlo estimate of the standard error of the fitting. Default = 100.
`bootstrap.sample.size`	Number of samples with replacements to drawn from the original series x in order to determine the standard errors for the GP parameters and return levels. Default = 100.
`return.period`	Quantiles at which the return level is going to be evaluated. Class numeric. Default = 100.
`total.length`	Uses the maximum likelihood estimator to calculate the probability of a measurement to be an exceedance. Else an estimate based on the mean number of exceedances in the available years (time stamps of the class xts time series) will be used. Default = NULL.
`extreme.type`	String specifying whether the exceedances over a very high ("max") or low ("min") threshold should be fitted. If the low one is chosen, the input x has to be either a time series of all points below the threshold with the threshold subtracted or the raw series and fit.gpd function will handle the extraction of the minima internally. For the minima the resulting extremes are multiplied by -1 and fitted using a default GP distribution. The resulting scale and shape parameter are handed back without any additional changes and the return levels are multiplied by -1. Default = "max".
`thresholding`	Logical value. If TRUE, the numerical value provided with threshold will be used in the `threshold` function to extract only the exceedances of the input time series x. If FALSE, the function assumes the input series x to consist only of exceedances over the provided threshold. If the cluster.distance or decluster is provided with a non NULL value, the value of thresholding is ignored and the `threshold` function is applied anyway. Default = FALSE.
`decluster`	Logical flag indicating whether or not to decluster the obtained exceedances over the threshold. Default = TRUE.
`cluster.distance`	Numerical value specifying how many points have to be below the threshold for the next point to be considered the starting point of a new cluster. Only supply a value when you really know what you are doing! Default = NULL.
`silent`	Determines whether or not warning messages shall be displayed and results shall be reported. Default = FALSE.
`debug`	If TRUE the routine will display debugging information featuring intermediate steps of the constrained optimization. Default = FALSE.
`mc.cores`	A numerical input specifying the number of cores to use for the multi core application of the function (see `detectCores`). This functionality is only available if the input is a list of different objects. If NULL, the function will be calculated classically with only one core. Default = NULL.
`...`	Additional arguments for the `optim` function.

Details

The optimization is performed by the augmented Lagrangian method using the auglag function of the alabama package. Within this framework the log-likelihood function of the GP gets augmented with N+2 constraints, where N is the number of points in the time series. N+1 of those constraints ensure the log-likelihood (containing two logarithms) to be always defined. The remaining constraints ensures for the shape parameter to be always bigger than -1 for the maximum likelihood to be defined in the first place. The penalty in the log-likelihood function is the sum of all squared constrain violations plus an additional term linear in the constraint violation to ensure well-conditioning. Using this penalty term the problem becomes unconstrained again and can be solved using optim. After each of those inner routines the weighting parameter of the penalty is being increased until some convergence conditions are fulfilled.

Since it usually takes just four to five outer iterations this functions needs only double the time a pure call to the optim function would need.

The total.length argument refers to the length of the original time series before the thresholding was applied. If present it will be used to calculate the maximum likelihood estimate of the probability of an observation to be a threshold exceedance (necessary to determine the estimation errors for the calculated return levels). Else an estimator based on mean number of exceedances per year will be used.

If the user instead wants to fit just the exponential distribution and not the entire GP distribution, the shape parameter of the initial has to be set to 0. But in practice this is strongly discouraged since it will yield inferior results.

I found the Nelder-Mead method to be more robust to starting points more far away from the global optimum. This also holds for the inner routine of the augmented Lagrangian method. Since other routines, like CG and BFGS only cause problems in the extreme value analysis, there won't be an option to choose them in this package.

This function can also be applied to a list of xts class objects.

Value

Output of the optim function with class c( "list", "climex.fit.gpd" )

par = MLE of the GP parameters
value = Value of the negative log-likelihood evaluated at the MLE
counts = Number of evaluations of the likelihood function and its gradient during optimization (inner routine)
outer.iteration = Number of updates of the penalty and the Lagrangian parameter to fine-tune the impact of the constraints on the optimization (outer routine)
return.level = Estimate of the return levels at the provided return periods
se = Standard error of the GP parameters and the return levels
x = Threshold exceedances
threshold = Value which had to be exceeded
control = Parameter and options used during optimization

If, on the other hand, a list of xts class object was provided, a list of objects structured as describe above is returned.

Author(s)

Philipp Mueller

Examples

  potsdam.anomalies <- anomalies( temp.potsdam )
  potsdam.extremes <- threshold( potsdam.anomalies,
                                threshold = 10,
                                decluster = TRUE )
  fit.gpd( potsdam.extremes )

theGreatWhiteShark/climex documentation built on July 13, 2022, 9:11 a.m.