fpot: Peaks Over Threshold Modelling using the Generalized Pareto...
In evd: Functions for Extreme Value Distributions

fpot	R Documentation

Peaks Over Threshold Modelling using the Generalized Pareto or Point Process Representation

Description

Maximum-likelihood fitting for peaks over threshold modelling, using the Generalized Pareto or Point Process representation, allowing any of the parameters to be held fixed if desired.

Usage

fpot(x, threshold, model = c("gpd", "pp"), start, npp = length(x),
    cmax = FALSE, r = 1, ulow = -Inf, rlow = 1, mper = NULL, ...,
    std.err = TRUE, corr = FALSE, method = "BFGS", warn.inf = TRUE)

Arguments

`x`	A numeric vector. If this contains missing values, those values are treated as if they fell below the threshold.
`threshold`	The threshold.
`model`	The model; either `"gpd"` (the default) or `"pp"`, for the Generalized Pareto or Point Process representations respectively.
`start`	A named list giving the initial values for the parameters over which the likelihood is to be maximized. If `start` is omitted the routine attempts to find good starting values using moment estimators.
`npp`	The data should contain `npp` observations per “period”, where the return level plot produced by `plot.pot` will represent return periods in units of “periods”. By default `npp = length(x)`, so that the “period” is the period of time over which the entire data set is collected. It may often be useful to change this default so that more sensible units are used. For example, if yearly periodic units are required, use `npp = 365.25` for daily data and `npp = 52.18` for weekly data. The argument only makes a difference to the actual fit if `mper` is not `NULL` or if `model = "pp"` (see Details).
`cmax`	Logical; if `FALSE` (the default), the model is fitted using all exceedences over the threshold. If `TRUE`, the model is fitted using cluster maxima, using clusters of exceedences derived from `clusters`.
`r`, `ulow`, `rlow`	Arguments used for the identification of clusters of exceedences (see `clusters`). Ignored if `cmax` is `FALSE` (the default).
`mper`	Controls the parameterization of the generalized Pareto model. Should be either `NULL` (the default), or a positive number (see Details). If `mper` is not `NULL` and `model = "pp"`, an error is returned.
`...`	Additional parameters, either for the model or for the optimization function `optim`. If parameters of the model are included they will be held fixed at the values given (see Examples).
`std.err`	Logical; if `TRUE` (the default), the standard errors are returned.
`corr`	Logical; if `TRUE`, the correlation matrix is returned.
`method`	The optimization method (see `optim` for details).
`warn.inf`	Logical; if `TRUE` (the default), a warning is given if the negative log-likelihood is infinite when evaluated at the starting values.

Details

The exeedances over the threshold threshold (if cmax is FALSE) or the maxima of the clusters of exeedances (if cmax is TRUE) are (if model = "gpd") fitted to a generalized Pareto distribution (GPD) with location threshold. If model = "pp" the exceedances are fitted to a non-homogeneous Poisson process (Coles, 2001).

If mper is NULL (the default), the parameters of the model (if model = "gpd") are scale and shape, for the scale and shape parameters of the GPD. If model = "pp" the parameters are loc, scale and shape. Under model = "pp" the parameters can be interpreted as parameters of the Generalized Extreme Value distribution, fitted to the maxima of npp random variables. In this case, the value of npp should be reasonably large.

For both characterizations, the shape parameters are equivalent. The scale parameter under the generalized Pareto characterization is equal to b + s(u - a), where a, b and s are the location, scale and shape parameters under the Point Process characterization, and where u is the threshold.

If \code{mper} = m is a positive value, then the generalized Pareto model is reparameterized so that the parameters are rlevel and shape, where rlevel is the m “period” return level, where “period” is defined via the argument npp.

The m “period” return level is defined as follows. Let G be the fitted generalized Pareto distribution function, with location \code{threshold} = u, so that 1 - G(z) is the fitted probability of an exceedance over z > u given an exceedance over u. The fitted probability of an exceedance over z > u is therefore p(1 - G(z)), where p is the estimated probabilty of exceeding u, which is given by the empirical proportion of exceedances. The m “period” return level z_m satisfies p(1 - G(z_m)) = 1/(mN), where N is the number of points per period (multiplied by the estimate of the extremal index, if cluster maxima are fitted). In other words, z_m is the quantile of the fitted model that corresponds to the upper tail probability 1/(mN). If mper is infinite, then z_m is the upper end point, given by threshold minus \code{scale}/\code{shape}, and the shape parameter is then restricted to be negative.

Value

Returns an object of class c("pot","uvevd","pot").

The generic accessor functions fitted (or fitted.values), std.errors, deviance, logLik and AIC extract various features of the returned object.

The function profile can be used to obtain deviance profiles for the model parameters. In particular, profiles of the m period return level z_m can be calculated and plotted when \code{mper} = m. The function anova compares nested models. The function plot produces diagnostic plots.

An object of class c("pot","uvevd","evd") is a list containing the following components

`estimate`	A vector containing the maximum likelihood estimates.
`std.err`	A vector containing the standard errors.
`fixed`	A vector containing the parameters of the model that have been held fixed.
`param`	A vector containing all parameters (optimized and fixed).
`deviance`	The deviance at the maximum likelihood estimates.
`corr`	The correlation matrix.
`var.cov`	The variance covariance matrix.
`convergence`, `counts`, `message`	Components taken from the list returned by `optim`.
`threshold`, `r`, `ulow`, `rlow`, `npp`	The arguments of the same name.
`nhigh`	The number of exceedences (if `cmax` is `FALSE`) or the number of clusters of exceedences (if `cmax` is `TRUE`).
`nat`, `pat`	The number and proportion of exceedences.
`extind`	The estimate of the extremal index (i.e. `nhigh` divided by `nat`). If `cmax` is `FALSE`, this is `NULL`.
`data`	The data passed to the argument `x`.
`exceedances`	The exceedences, or the maxima of the clusters of exceedences.
`mper`	The argument `mper`.
`scale`	The scale parameter for the fitted generalized Pareto distribution. If `mper` is `NULL` and `model = "gpd"` (the defaults), this will also be an element of `param`.
`call`	The call of the current function.

Warning

The standard errors and the correlation matrix in the returned object are taken from the observed information, calculated by a numerical approximation. They must be interpreted with caution when the shape parameter is less than -0.5, because the usual asymptotic properties of maximum likelihood estimators do not then hold (Smith, 1985).

References

Smith, R. L. (1985) Maximum likelihood estimation in a class of non-regular cases. Biometrika, 72, 67–90.

Examples

uvdata <- rgpd(100, loc = 0, scale = 1.1, shape = 0.2)
M1 <- fpot(uvdata, 1)
M2 <- fpot(uvdata, 1, shape = 0)
anova(M1, M2)
par(mfrow = c(2,2))
plot(M1)
## Not run: M1P <- profile(M1)
## Not run: plot(M1P)

M1 <- fpot(uvdata, 1, mper = 10)
M2 <- fpot(uvdata, 1, mper = 100)
## Not run: M1P <- profile(M1, which = "rlevel", conf=0.975, mesh=0.1)
## Not run: M2P <- profile(M2, which = "rlevel", conf=0.975, mesh=0.1)
## Not run: plot(M1P)
## Not run: plot(M2P)

evd documentation built on Sept. 21, 2024, 9:06 a.m.