# gpd: Generalized Pareto Distribution Regression Family Function In VGAM: Vector Generalized Linear and Additive Models

## Description

Maximum likelihood estimation of the 2-parameter generalized Pareto distribution (GPD).

## Usage

 ```1 2 3 4``` ```gpd(threshold = 0, lscale = "loglink", lshape = logofflink(offset = 0.5), percentiles = c(90, 95), iscale = NULL, ishape = NULL, tolshape0 = 0.001, type.fitted = c("percentiles", "mean"), imethod = 1, zero = "shape") ```

## Arguments

 `threshold` Numeric, values are recycled if necessary. The threshold value(s), called mu below. `lscale` Parameter link function for the scale parameter sigma. See `Links` for more choices. `lshape` Parameter link function for the shape parameter xi. See `Links` for more choices. The default constrains the parameter to be greater than -0.5 because if xi <= -0.5 then Fisher scoring does not work. See the Details section below for more information. For the shape parameter, the default `logofflink` link has an offset called A below; and then the second linear/additive predictor is log(xi+A) which means that xi > -A. The working weight matrices are positive definite if A = 0.5.
 `percentiles` Numeric vector of percentiles used for the fitted values. Values should be between 0 and 100. See the example below for illustration. This argument is ignored if `type.fitted = "mean"`. `type.fitted` See `CommonVGAMffArguments` for information. The default is to use the `percentiles` argument. If `"mean"` is chosen, then the mean mu + sigma / (1-xi) is returned as the fitted values, and these are only defined for xi<1. `iscale, ishape` Numeric. Optional initial values for sigma and xi. The default is to use `imethod` and compute a value internally for each parameter. Values of `ishape` should be between -0.5 and 1. Values of `iscale` should be positive.
 `tolshape0` Passed into `dgpd` when computing the log-likelihood.
 `imethod` Method of initialization, either 1 or 2. The first is the method of moments, and the second is a variant of this. If neither work, try assigning values to arguments `ishape` and/or `iscale`. `zero` Can be an integer-valued vector specifying which linear/additive predictors are modelled as intercepts only. For one response, the value should be from the set {1,2} corresponding respectively to sigma and xi. It is often a good idea for the sigma parameter only to be modelled through a linear combination of the explanatory variables because the shape parameter is probably best left as an intercept only: `zero = 2`. Setting `zero = NULL` means both parameters are modelled with explanatory variables. See `CommonVGAMffArguments` for more details.

## Details

The distribution function of the GPD can be written

G(y) = 1 - [1 + xi (y-mu)/ sigma ]_{+}^{- 1/ xi}

where mu is the location parameter (known, with value `threshold`), sigma > 0 is the scale parameter, xi is the shape parameter, and h_+ = max(h,0). The function 1-G is known as the survivor function. The limit xi --> 0 gives the shifted exponential as a special case:

G(y) = 1 - exp[-(y-mu)/ sigma].

The support is y>mu for xi>0, and mu < y <mu-sigma / xi for xi<0.

Smith (1985) showed that if xi <= -0.5 then this is known as the nonregular case and problems/difficulties can arise both theoretically and numerically. For the (regular) case xi > -0.5 the classical asymptotic theory of maximum likelihood estimators is applicable; this is the default.

Although for xi < -0.5 the usual asymptotic properties do not apply, the maximum likelihood estimator generally exists and is superefficient for -1 < xi < -0.5, so it is “better” than normal. When xi < -1 the maximum likelihood estimator generally does not exist as it effectively becomes a two parameter problem.

The mean of Y does not exist unless xi < 1, and the variance does not exist unless xi < 0.5. So if you want to fit a model with finite variance use `lshape = "extlogitlink"`.

## Value

An object of class `"vglmff"` (see `vglmff-class`). The object is used by modelling functions such as `vglm` and `vgam`. However, for this VGAM family function, `vglm` is probably preferred over `vgam` when there is smoothing.

## Warning

Fitting the GPD by maximum likelihood estimation can be numerically fraught. If 1 + xi*(y-mu)/sigma <= 0 then some crude evasive action is taken but the estimation process can still fail. This is particularly the case if `vgam` with `s` is used. Then smoothing is best done with `vglm` with regression splines (`bs` or `ns`) because `vglm` implements half-stepsizing whereas `vgam` doesn't. Half-stepsizing helps handle the problem of straying outside the parameter space.

## Note

The response in the formula of `vglm` and `vgam` is y. Internally, y-mu is computed. This VGAM family function can handle a multiple responses, which is inputted as a matrix. The response stored on the object is the original uncentred data.

With functions `rgpd`, `dgpd`, etc., the argument `location` matches with the argument `threshold` here.

T. W. Yee

## References

Yee, T. W. and Stephenson, A. G. (2007). Vector generalized linear and additive extreme value models. Extremes, 10, 1–19.

Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. London: Springer-Verlag.

Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67–90.

`rgpd`, `meplot`, `gev`, `paretoff`, `vglm`, `vgam`, `s`.
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43``` ```# Simulated data from an exponential distribution (xi = 0) Threshold <- 0.5 gdata <- data.frame(y1 = Threshold + rexp(n = 3000, rate = 2)) fit <- vglm(y1 ~ 1, gpd(threshold = Threshold), data = gdata, trace = TRUE) head(fitted(fit)) summary(depvar(fit)) # The original uncentred data coef(fit, matrix = TRUE) # xi should be close to 0 Coef(fit) summary(fit) head(fit@extra\$threshold) # Note the threshold is stored here # Check the 90 percentile ii <- depvar(fit) < fitted(fit)[1, "90%"] 100 * table(ii) / sum(table(ii)) # Should be 90% # Check the 95 percentile ii <- depvar(fit) < fitted(fit)[1, "95%"] 100 * table(ii) / sum(table(ii)) # Should be 95% ## Not run: plot(depvar(fit), col = "blue", las = 1, main = "Fitted 90% and 95% quantiles") matlines(1:length(depvar(fit)), fitted(fit), lty = 2:3, lwd = 2) ## End(Not run) # Another example gdata <- data.frame(x2 = runif(nn <- 2000)) Threshold <- 0; xi <- exp(-0.8) - 0.5 gdata <- transform(gdata, y2 = rgpd(nn, scale = exp(1 + 0.1*x2), shape = xi)) fit <- vglm(y2 ~ x2, gpd(Threshold), data = gdata, trace = TRUE) coef(fit, matrix = TRUE) ## Not run: # Nonparametric fits # Not so recommended: fit1 <- vgam(y2 ~ s(x2), gpd(Threshold), data = gdata, trace = TRUE) par(mfrow = c(2, 1)) plot(fit1, se = TRUE, scol = "blue") # More recommended: fit2 <- vglm(y2 ~ sm.bs(x2), gpd(Threshold), data = gdata, trace = TRUE) plot(as(fit2, "vgam"), se = TRUE, scol = "blue") ## End(Not run) ```