game | R Documentation |
gamGPDfit()
fits the parameters of a generalized Pareto
distribution (GPD) depending on covariates in a non- or semiparametric
way.
gamGPDboot()
fits and bootstraps the parameters of a GPD
distribution depending on covariates in a non- or semiparametric
way. Applies the post-blackend bootstrap of Chavez-Demoulin and
Davison (2005).
gamGPDfit(x, threshold, nextremes = NULL, datvar, etaFrhs, nuFrhs,
init = fit.GPD(x[,datvar], threshold = threshold,
type = "pwm", verbose = FALSE)$par.ests,
niter = 32, include.updates = FALSE, eps.eta = 1e-05, eps.nu = 1e-05,
progress = TRUE, adjust = TRUE, verbose = FALSE, ...)
gamGPDboot(x, B, threshold, nextremes = NULL, datvar, etaFrhs, nuFrhs,
init = fit.GPD(x[,datvar], threshold = threshold,
type = "pwm", verbose = FALSE)$par.ests,
niter = 32, include.updates = FALSE, eps.eta = 1e-5, eps.nu = 1e-5,
boot.progress = TRUE, progress = FALSE, adjust = TRUE, verbose = FALSE,
debug = FALSE, ...)
x |
data.frame containing the losses (in some component; can be
specified with the argument |
B |
number of bootstrap replications. |
threshold |
threshold of the peaks-over-threshold (POT) method. |
nextremes |
number of excesses. This can be used to determine |
datvar |
name of the data column in |
etaFrhs |
right-hand side of the formula for |
nuFrhs |
right-hand side of the formula for |
init |
bivariate vector containing initial values
for |
niter |
maximal number of iterations in the backfitting algorithm. |
include.updates |
|
eps.eta |
epsilon for stop criterion for |
eps.nu |
epsilon for stop criterion for |
boot.progress |
|
progress |
|
adjust |
|
verbose |
|
debug |
|
... |
additional arguments passed to |
gamGPDfit()
fits the parameters \xi
and
\beta
of the generalized Pareto distribution
\mathrm{GPD}(\xi,\beta)
depending on covariates in
a non- or semiparametric way. The distribution function is given by
G_{\xi,\beta}(x)=1-(1+\xi x/\beta)^{-1/\xi},\quad
x\ge0,
for \xi>0
(which is what we assume) and
\beta>0
. Note that \beta
is also denoted by
\sigma
in this package. Estimation of \xi
and \beta
by gamGPDfit()
is done via penalized maximum
likelihood estimation, where the estimators are computed with a
backfitting algorithm. In order to guarantee convergence of this
algorithm, reparameterizations of \xi
and \beta
in terms of the parameters \eta
and \nu
are done
via
\xi=\exp(\eta)-1
and
\beta=\exp(\nu)/(1+\xi)=\exp(\nu-\eta).
The parameters \eta
and \nu
(and thus
\xi
and \beta
) are allowed to depend on
covariates (including time) in a non- or semiparametric way, for example:
\eta=\eta(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\eta}+h_{\eta}(t),
\nu=\nu(\bm{x},t)=\bm{x}^{\top}\bm{\alpha}_{\nu}+h_{\nu}(t),
where \bm{x}
denotes the vector of covariates,
\bm{\alpha}_{\eta}
, \bm{\alpha}_{\nu}
are parameter vectors and h_{\eta}
, h_{\nu}
are
regression splines. For more details, see the references and the source
code.
gamGPDboot()
first fits the GPD parameters via
gamGPDfit()
. It then conducts the post-blackend bootstrap of
Chavez-Demoulin and Davison (2005). To this end, it computes the
residuals, resamples them (B
times), reconstructs the
corresponding excesses, and refits the GPD parameters via
gamGPDfit()
again.
Note that if gam()
fails in gamGPDfit()
or the
fitting or one of the bootstrap replications in gamGPDboot()
,
then the output object contains (an) empty (sub)list(s). These
failures typically happen for too small sample sizes.
gamGPDfit()
returns either an empty list (list()
; in
case at least one of the two gam()
calls in the internal
function gamGPDfitUp()
fails) or a list with the components
xi
:estimated parameters \xi
;
beta
:estimated parameters \beta
;
eta
:estimated parameters \eta
;
nu
:estimated parameters \nu
;
se.eta
:standard error for \eta
((possibly
adjusted) second-order derivative of the reparameterized
log-likelihood with respect to \eta
) multiplied by -1;
se.nu
:standard error for \nu
((possibly
adjusted) second-order derivative of the reparameterized
log-likelihood with respect to \nu
) multiplied by -1;
eta.covar
:(unique) covariates for \eta
;
nu.covar
:(unique) covariates for \nu
;
covar
:available covariate combinations used for
fitting \beta
(\eta
, \nu
);
y
:vector of excesses (exceedances minus threshold);
res
:residuals;
MRD
:mean relative distances between for all
iterations, calculated between old parameters (\eta, \eta)
(from the last iteration) and new parameters (currently
estimated ones);
logL
:log-likelihood at the estimated parameters;
etaObj
:R object of type gamObject
for estimated
\eta
(returned by mgcv::gam()
);
nuObj
:R object of type gamObject
for estimated
\nu
(returned by mgcv::gam()
);
etaUpdates
:if include.updates
is
TRUE
, updates for \eta
for each
iteration. This is a list of R objects of type gamObject
which contains etaObj
as last element;
nuUpdates
:if include.updates
is
TRUE
, updates for \nu
for each
iteration. This is a list of R objects of type gamObject
which contains nuObj
as last element;
gamGPDboot()
returns a list of length B+1
where
the first component contains the results of
the initial fit via gamGPDfit()
and the other B
components contain the results for each replication of the
post-blackend bootstrap. Components for which gam()
fails (e.g., due to too few data) are given as empty lists (list()
).
Marius Hofert, Valerie Chavez-Demoulin.
Chavez-Demoulin, V., and Davison, A. C. (2005). Generalized additive models for sample extremes. Applied Statistics 54(1), 207–222.
Chavez-Demoulin, V., Embrechts, P., and Hofert, M. (2014). An extreme value approach for modeling Operational Risk losses depending on covariates. Journal of Risk and Insurance; accepted.
## generate an example data set
years <- 2003:2012 # years
nyears <- length(years)
n <- 250 # sample size for each (different) xi
u <- 200 # threshold
rGPD <- function(n, xi, beta) ((1-runif(n))^(-xi)-1)*beta/xi # sampling GPD
set.seed(17) # setting seed
xi.true.A <- seq(0.4, 0.8, length=nyears) # true xi for group "A"
## generate losses for group "A"
lossA <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.A[y], beta=1)))
xi.true.B <- xi.true.A^2 # true xi for group "B"
## generate losses for group "B"
lossB <- unlist(lapply(1:nyears,
function(y) u + rGPD(n, xi=xi.true.B[y], beta=1)))
## build data frame
time <- rep(rep(years, each=n), 2) # "2" stands for the two groups
covar <- rep(c("A","B"), each=n*nyears)
value <- c(lossA, lossB)
x <- data.frame(covar=covar, time=time, value=value)
## fit
eps <- 1e-3 # to decrease the run time for this example
require(mgcv) # due to s()
fit <- gamGPDfit(x, threshold=u, datvar="value", etaFrhs=~covar+s(time)-1,
nuFrhs=~covar+s(time)-1, eps.eta=eps, eps.nu=eps)
## note: choosing s(..., bs="cr") will fit cubic splines
## grab the fitted values per group and year
xi.fit <- expm1(fitted(fit$etaObj))
xi.fit. <- xi.fit[1+(0:(2*nyears-1))*n] # fit for each group and year
xi.fit.A <- xi.fit.[1:nyears] # fit for "A" and each year
xi.fit.B <- xi.fit.[(nyears+1):(2*nyears)] # fit for "B" and each year
## plot the fitted values of xi and the true ones we simulated from
par(mfrow=c(1,2))
plot(years, xi.true.A, type="l", ylim=range(xi.true.A, xi.fit.A),
main="Group A", xlab="Year", ylab=expression(xi))
points(years, xi.fit.A, type="l", col="red")
legend("topleft", inset=0.04, lty=1, col=c("black", "red"),
legend=c("true", "fitted"), bty="n")
plot(years, xi.true.B, type="l", ylim=range(xi.true.B, xi.fit.B),
main="Group B", xlab="Year", ylab=expression(xi))
points(years, xi.fit.B, type="l", col="blue")
legend("topleft", inset=0.04, lty=1, col=c("black", "blue"),
legend=c("true", "fitted"), bty="n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.