View source: R/family.extremes.R
gpd | R Documentation |
Maximum likelihood estimation of the 2-parameter generalized Pareto distribution (GPD).
gpd(threshold = 0, lscale = "loglink", lshape = logofflink(offset = 0.5),
percentiles = c(90, 95), iscale = NULL, ishape = NULL,
tolshape0 = 0.001, type.fitted = c("percentiles", "mean"),
imethod = 1, zero = "shape")
threshold |
Numeric, values are recycled if necessary.
The threshold value(s), called |
lscale |
Parameter link function for the scale parameter |
lshape |
Parameter link function for the shape parameter For the shape parameter,
the default |
percentiles |
Numeric vector of percentiles used
for the fitted values. Values should be between 0 and 100.
See the example below for illustration.
This argument is ignored if |
type.fitted |
See |
iscale , ishape |
Numeric. Optional initial values for |
tolshape0 |
Passed into |
imethod |
Method of initialization, either 1 or 2. The first is the method of
moments, and the second is a variant of this. If neither work, try
assigning values to arguments |
zero |
Can be an integer-valued vector specifying which
linear/additive predictors are modelled as intercepts only.
For one response, the value should be from the set {1,2}
corresponding respectively to |
The distribution function of the GPD can be written
G(y) = 1 - [1 + \xi (y-\mu) / \sigma ]_{+}^{- 1/ \xi}
where
\mu
is the location parameter
(known, with value threshold
),
\sigma > 0
is the scale parameter,
\xi
is the shape parameter, and
h_+ = \max(h,0)
.
The function 1-G
is known as the survivor function.
The limit \xi \rightarrow 0
gives the shifted exponential as a special case:
G(y) = 1 - \exp[-(y-\mu)/ \sigma].
The support is y>\mu
for \xi>0
,
and
\mu < y <\mu-\sigma / \xi
for \xi<0
.
Smith (1985) showed that if \xi <= -0.5
then
this is known as the nonregular case and problems/difficulties
can arise both theoretically and numerically. For the (regular)
case \xi > -0.5
the classical asymptotic
theory of maximum likelihood estimators is applicable; this is
the default.
Although for \xi < -0.5
the usual asymptotic properties
do not apply, the maximum likelihood estimator generally exists and
is superefficient for -1 < \xi < -0.5
, so it is
“better” than normal.
When \xi < -1
the maximum
likelihood estimator generally does not exist as it effectively becomes
a two parameter problem.
The mean of Y
does not exist unless \xi < 1
, and
the variance does not exist unless \xi < 0.5
. So if
you want to fit a model with finite variance use lshape = "extlogitlink"
.
An object of class "vglmff"
(see vglmff-class
).
The object is used by modelling functions such as vglm
and vgam
.
However, for this VGAM family function, vglm
is probably preferred over vgam
when there is smoothing.
Fitting the GPD by maximum likelihood estimation can be numerically
fraught. If 1 + \xi (y-\mu)/ \sigma \leq 0
then some crude evasive action is taken but the estimation process
can still fail. This is particularly the case if vgam
with s
is used. Then smoothing is best done with
vglm
with regression splines (bs
or ns
) because vglm
implements
half-stepsizing whereas vgam
doesn't. Half-stepsizing
helps handle the problem of straying outside the parameter space.
The response in the formula of vglm
and vgam
is y
.
Internally, y-\mu
is computed.
This VGAM family function can handle a multiple
responses, which is inputted as a matrix.
The response stored on the object is the original uncentred data.
With functions rgpd
, dgpd
, etc., the
argument location
matches with the argument threshold
here.
T. W. Yee
Yee, T. W. and Stephenson, A. G. (2007). Vector generalized linear and additive extreme value models. Extremes, 10, 1–19.
Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. London: Springer-Verlag.
Smith, R. L. (1985). Maximum likelihood estimation in a class of nonregular cases. Biometrika, 72, 67–90.
rgpd
,
meplot
,
gev
,
paretoff
,
vglm
,
vgam
,
s
.
# Simulated data from an exponential distribution (xi = 0)
Threshold <- 0.5
gdata <- data.frame(y1 = Threshold + rexp(n = 3000, rate = 2))
fit <- vglm(y1 ~ 1, gpd(threshold = Threshold), data = gdata, trace = TRUE)
head(fitted(fit))
summary(depvar(fit)) # The original uncentred data
coef(fit, matrix = TRUE) # xi should be close to 0
Coef(fit)
summary(fit)
head(fit@extra$threshold) # Note the threshold is stored here
# Check the 90 percentile
ii <- depvar(fit) < fitted(fit)[1, "90%"]
100 * table(ii) / sum(table(ii)) # Should be 90%
# Check the 95 percentile
ii <- depvar(fit) < fitted(fit)[1, "95%"]
100 * table(ii) / sum(table(ii)) # Should be 95%
## Not run: plot(depvar(fit), col = "blue", las = 1,
main = "Fitted 90% and 95% quantiles")
matlines(1:length(depvar(fit)), fitted(fit), lty = 2:3, lwd = 2)
## End(Not run)
# Another example
gdata <- data.frame(x2 = runif(nn <- 2000))
Threshold <- 0; xi <- exp(-0.8) - 0.5
gdata <- transform(gdata, y2 = rgpd(nn, scale = exp(1 + 0.1*x2), shape = xi))
fit <- vglm(y2 ~ x2, gpd(Threshold), data = gdata, trace = TRUE)
coef(fit, matrix = TRUE)
## Not run: # Nonparametric fits
# Not so recommended:
fit1 <- vgam(y2 ~ s(x2), gpd(Threshold), data = gdata, trace = TRUE)
par(mfrow = c(2, 1))
plot(fit1, se = TRUE, scol = "blue")
# More recommended:
fit2 <- vglm(y2 ~ sm.bs(x2), gpd(Threshold), data = gdata, trace = TRUE)
plot(as(fit2, "vgam"), se = TRUE, scol = "blue")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.