knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette provides complementary information to the R Documentation
for the ReturnCurves
package. It summarises the key methodologies
implemented in the package and is heavily based on the works of
@MurphyBarltropetal2023 and @MurphyBarltropetal2024; for full details we
refer the user to these articles.
The ReturnCurves
package aims at estimating the $p$-probability return
curve [@MurphyBarltropetal2023], for small $p>0,$ while implementing
pointwise and smooth approaches to estimate the so called angular
dependence function first introduced by @WadsworthTawn2013.
library(ReturnCurves)
To illustrate the functionality of the package, we use the data set airdata
which contained air pollution data collected from Marylebone, London (UK). The data set contains $1427$ daily measurements of air pollutant concentrations of NOx and PM10.
data(airdata)
The estimation of the angular dependence function and/or of the return curve is implemented for a bivariate vector $(X_E,Y_E)$ marginally distributed as standard exponential, i.e, $X_E, \, Y_E\sim \text{Exp}(1).$ Thus, the original data $(X, Y)$ needs to be marginally transformed, which is achieved via the probability integral transform. We follow the procedure of @ColesTawn1991 where the empirical cumulative distribution function $\tilde{F}$ is fitted below a threshold $u$, and a generalised Pareto distribution (GPD) is fitted above, giving the following estimate of the marginal cumulative distribution function (cdf) of $X$ or $Y:$ \begin{equation} \label{eq:pit} \hat{F}(z) = \begin{cases} 1-\left(1-\tilde{F}(u)\right)\left[1+\hat\xi\frac{z-u}{\hat\sigma}\right]_+^{-1/\hat\xi}, & \text{if } z>u, \ \tilde{F}(z), & \text{if } z \leq u, \end{cases} \end{equation} where $\hat\sigma$ and $\hat\xi$ are the estimated scale and shape parameters of the GPD. Exponential margins are obtained by applying $-\log(1-\hat F(\cdot))$ to each margin, where $\hat F(\cdot)$ is estimated separately for each margin.
This is done with the function margtransf
which takes as inputs a matrix
containing the original data, a vector of the marginal quantiles used to
fit the GPD and a boolean value constrainedshape
which decides whether
$\xi> -1$ if set to TRUE
(Default), or $\xi \in \mathbb{R}$ if set to
FALSE
.
Function margtransf
returns an object of S4 class margtrasnf.class
with 6 attributes:
data
: matrix with the data on the original marginsqmarg
: vector of marginal quantiles used to fit the GPDconstrainedshape
: whether $\xi>-1$ (TRUE
) or $\xi\in\mathbb{R}$ (FALSE
)parameters
: matrix containing estimates of parameters $(\hat \sigma, \hat\xi)$thresh
: vector containing threshold $u$ above which the GPD is
fitteddataexp
: matrix with the data on standard exponential margins# qmarg and constrainedshape set to the default values expdata <- margtransf(data = airdata, qmarg = rep(0.95, 2), constrainedshape = T) # attributes of the S4 object str(expdata) # head of the data on standard exponential margins head(expdata@dataexp)
It is possible to plot an S4 object of margtrasnf.class
with plot
.
By setting argument which = "hist"
, histograms of each variable on
original and standard exponential margins can be seen:
plot(expdata, which = "hist")
To visualise the time series of each variable on original and standard
exponential margins, we need to set which = "ts"
:
plot(expdata, which = "ts")
The joint distribution on original and standard exponential margins can
be accessed with which = "joint"
:
plot(expdata, which = "joint")
Finally, it is possible to plot all these together by setting
which = "all"
, which is the default for this argument.
plot(expdata, which = "all") # or just plot(expdata)
When transforming the data onto standard exponential variables as in equation \eqref{eq:pit}, it is assumed that above a threshold $u$ data follows a GPD. It is possible to assess if this is a reasonable assumption through checking if there is an agreement between model and empirical GPD quantiles. This is done via QQ plots in the ReturnCurves
package by plotting points $\left(F_{GPD}^{-1}\left(\frac{i}{n_{exc} + 1}\right) + u, X^{GPD}{(i)} + u\right),$ where $X^{GPD}{(i)}$ denotes the $i$-th ordered increasing statistic $(i = 1, \ldots, n)$ of the exceedances, i.e., $X^{GPD}=(X-u \mid X >u),$ $n_{exc}$ denotes the sample size of these exceedances, and $F_{GPD}^{-1}$ denotes the inverse of the cumulative distribution function of a GPD. Finally, the uncertainty on the empirical quantiles is quantified using a bootstrap approach. If temporal dependence is present in the data, then a block bootstrap approach is required, i.e., blocksize > 1
.
This is done using the function marggpd
function which takes as inputs an S4 object of class margtransf.class
, the size of blocks of the bootstrap procedure and the corresponding number of samples, and the significance level $\alpha$ for the tolerance intervals. It then returns an S4 object of class marggpd.class
with an extra attribute marggpd
containing a list with:
model
: a list containing the model quantiles for each variable,empirical
: a list containing the empirical quantiles for each variable,lower
: a list containing the lower bounds of the tolerance intervals for each variable,upper
: a list containing the upper bounds of the tolerance intervals for each variable.# nboot and alpha are set to the default values # blocksize is set to 10 to account for temporal dependence uncgpd <- marggpd(margdata = expdata, blocksize = 10, nboot = 250, alpha = 0.05) # attributes of the S4 object str(uncgpd) # head of the list elements of slot marggpd for variable X head(uncgpd@marggpd$model[[1]]) head(uncgpd@marggpd$empirical[[1]]) head(uncgpd@marggpd$lower[[1]]) head(uncgpd@marggpd$upper[[1]])
It is possible to plot an S4 object of marggpd.class
with
plot
, where the QQ plots with the model and empirical quantiles for each variable are
shown. The points should lie close to the line $y=x;$ for a good fit and
agreement between these quantiles, the line $y=x$ should mainly lie within
the $(1-\alpha)\%$ tolerance intervals.
plot(uncgpd)
In bivariate extremes, interest may lie in studying regions where both variables are extreme or where only one is extreme. For this, methods that aim at characterising the joint tail behaviour in both scenarios, such as the one introduced by @WadsworthTawn2013, are required. Given standard exponentially distributed variables $X_E$ and $Y_E$ and a slowly varying function $\mathcal{L}(\cdot; \omega)$ at infinity, the joint tail behaviour of $(X_E,Y_E)$ is captured through $\lambda(\omega)$ via the assumption \begin{equation} \text{Pr}(X_E > \omega u,\, Y_E > (1-\omega) u) = \mathcal{L}(e^u; \omega)e^{-\lambda(\omega)u} \quad \text{as } u \to \infty, \end{equation} which can be rewritten as \begin{equation}\label{eq:wt} \text{Pr}\left(\min\left{\frac{X_E}{\omega}, \, \frac{Y_E}{1-\omega}\right}\right) = \mathcal{L}(e^u; \omega)e^{-\lambda(\omega)u} \quad \text{as } u \to \infty, \end{equation} where $\omega\in[0,1]$ and $\lambda(\omega)\geq \max{\omega, 1-\omega}$ is called the angular dependence function (ADF). In the case of asymptotic dependence (see for instance, @Colesetal1999), $\lambda(\omega)=\max{\omega, 1-\omega},$ for all $\omega\in[0,1].$
Lastly, defining a min-projection variable at $\omega,$ $T_\omega = \min\left{\frac{X_E}{\omega}, \, \frac{Y_E}{1-\omega}\right},$ equation \eqref{eq:wt} implies that \begin{equation}\label{eq:minproj} \text{Pr}(T_\omega>u+t\mid T_\omega>u) = \frac{\mathcal{L}(e^{u+t}; \omega)}{\mathcal{L}(e^u; \omega)}e^{-\lambda(\omega)t} \to e^{-\lambda(\omega)t} \quad \text{as } u \to \infty, \end{equation} for any $\omega\in[0,1]$ and $t>0.$ In other words, for all $\omega\in[0,1]$ and, as $u_\omega\to \infty,$ $T_\omega^1 := (T_\omega-u_\omega\mid T_\omega>u_\omega)\sim \text{Exp}(\lambda(\omega)).$ Estimation of the ADF can be done in different ways; @MurphyBarltropetal2024 present a few.
For the ReturnCurves
package, two approaches are implemented: a
pointwise estimator using the Hill estimator [@Hill1975],
$\hat{\lambda}H,$ and a smoother estimator based on Bernstein-Bézier
polynomials estimated via composite likelihood methods,
$\hat{\lambda}{CL}.$ For the latter, @MurphyBarltropetal2024 propose
using a family of Bernstein-Bézier polynomials to improve the estimation
of the ADF. Given $k\in \mathbb{N},$ it is assumed that
$\lambda(\omega)=\lambda(\omega;\boldsymbol{\beta})$ can be represented
by the following family of functions: \begin{align}\label{eq:bbp}
\mathcal{B}k^*=\left{(1-\omega)^k+\sum{i = 1}^{k-1}\beta_i {k \choose i} \omega^i(1-\omega)^{k-i}+\omega^k=:f(\omega)\mid \omega\in[0,1],\right. \nonumber \
\left.\phantom{\sum_{i = 1}^{k-1}{k \choose i}}\boldsymbol{\beta}\in[0, \infty)^{k-1} \text{ such that } f(\omega)\geq\max{\omega, 1-\omega}\right}.
\end{align}
As $T_\omega^1$ is exponentially distributed when $u_\omega\to\infty,$ the parameter vector $\boldsymbol{\beta}$ can be estimated using a composite likelihood function defined as \begin{equation} \label{eq:clf} \mathcal{L}C(\boldsymbol{\beta}) = \left[\prod{\omega \in \Omega}\lambda(\omega;\boldsymbol{\beta})^{\mid \boldsymbol{t}\omega^1\mid}\right]\exp\left{-\sum{\omega \in \Omega}\sum_{t_\omega^1\in \boldsymbol{t}\omega^1}\lambda(\omega;\boldsymbol{\beta})t\omega\right}, \end{equation} where $\mid \boldsymbol{t}\omega^1\mid$ represents the cardinality of set $\boldsymbol{t}\omega^1:={t_\omega-u_\omega\mid t_\omega\in \boldsymbol{t}\omega, \, t\omega>u_\omega}$ for some large values $u_\omega,$ and $\Omega$ is a finite subset spanning the interval $[0,1].$ The estimator of the ADF through composite likelihood methods is given by $\lambda(\cdot;\boldsymbol{\hat\beta}{CL})$ where $\boldsymbol{\hat\beta}{CL}$ maximises equation \eqref{eq:clf}.
Finally, @MurphyBarltropetal2024 showed that incorporating knowledge of the conditional extremes [@HeffernanTawn2004] parameters $\alpha_{y\mid x}$ and $\alpha_{x\mid y}$ improves the estimation of the ADF. In particular, the authors show that, in order to satisfy theoretical properties of $\lambda(\omega),$ $\lambda(\omega)=\max{\omega, 1-\omega}$ for all $\omega\in[0, \alpha_{x\mid y}^1]\cup[\alpha_{y\mid x}^1, 1]$ with $\alpha_{x\mid y}^1=\alpha_{x\mid y}/(1 + \alpha_{x\mid y})$ and $\alpha_{y\mid x}^1=1/(1 + \alpha_{y\mid x}).$ Thus, after estimating the conditional extremes parameters $\alpha_{y\mid x}$ and $\alpha_{x\mid y}$ through maximum likelihood estimation, we can set $\lambda(\omega)=\max{\omega, 1-\omega}$ for $\omega \in [0, \hat\alpha_{x\mid y}^1)\cup(\hat\alpha_{y\mid x}^1, 1].$ Then, for the Hill estimator, $\lambda(\omega)=\hat\lambda_H$ for $\omega \in \left[\hat\alpha_{x\mid y}^1, \hat\alpha_{y\mid x}^1\right].$ For the composite likelihood estimator, a rescaling of equation \eqref{eq:bbp} is needed to ensure continuity at $\hat\alpha_{x\mid y}^1$ and $\hat\alpha_{y\mid x}^1,$ as defined below: \begin{align} \mathcal{B}k^1=\left{(1-\hat\alpha{x\mid y}^1)\left(1-\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^k+\sum_{i = 1}^{k-1}\beta_i {k \choose i} \left(\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^i\left(1-\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^{k-i}+\right. \ \left.\hat\alpha_{y\mid x}^1\left(\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^k=:f(v)\mid v\in\left[\hat\alpha_{x\mid y}^1, \hat\alpha_{y\mid x}^1\right],\boldsymbol{\beta}\in[0, \infty)^{k-1} \text{ such that } f(v)\geq\max{v, 1-v}\right}. \end{align} $\lambda(\omega)=\lambda(\omega;\boldsymbol{\beta})$ is assumed to be represented by an element of $\mathcal{B}k^1$ on $\left[\hat\alpha{x\mid y}^1, \hat\alpha_{y\mid x}^1\right].$ Finally, the estimators used for estimation are processed in order to satisfy theoretical conditions on $\lambda$ as identified in @MurphyBarltropetal2024.
Estimation of the ADF can be done using the function adf_est
which
takes as inputs:
margtransf.class
representing the marginal
transformation of the data,w
in $[0,1],$method
indicating which estimator to get, $\lambda_H$ or
$\lambda_{CL},$constrained
which decides whether to
incorporate conditional extremes parameters $\alpha_{y\mid x}$ and
$\alpha_{x\mid y}$ in the estimation.Additional arguments can be defined outside of the default values; these
include marginal quantiles for the min-projection variable $T^1$ at ray $\omega,$
marginal quantiles to fit the conditional extremes method if
constrained=TRUE
, and, if method= "cl"
, the polynomial degree $k,$ the initial values for
$\boldsymbol{\beta}$ for the composite maximum likelihood procedure, and
the convergence tolerance. Convergence is declared when the difference of
log-likelihood values between iterations does not exceed the value of tol
.
This repeated optimisation helps to avoid convergence to local maxima, although
does not guarantee finding the global maximum.
Function adf_est
returns an object of S4 class adf_est.class
with
11 attributes, where the first 9 are the inputs of the function and
the last 2 are vectors:
interval
: contains the maximum likelihood estimates from the conditional extremes model $\hat\alpha^1_{x\mid y}$ and $\hat\alpha^1_{y\mid x}$ if constrained = TRUE
. Otherwise, it returns the values $0$ and $1;$ this has no meaningful interpretation as the estimation is performed in an unconstrained interval.adf
: contains the estimates of $\lambda(\omega).$# Estimation using Hill estimator without conditional extremes parameters whill <- seq(0, 1, by = 0.001) ## q and constrained are set to the default values here lambdah <- adf_est(margdata = expdata, w = whill, method = "hill", q = 0.95, constrained = F) # Estimation using Hill estimator with conditional extremes parameters ## q and qalphas are set to the default values lambdah2 <- adf_est(margdata = expdata, w = whill, method = "hill", q = 0.95, qalphas = rep(0.95, 2), constrained = T) # Estimation using CL method without conditional extremes parameters ## w, q and constrained are set to the default values here lambdacl <- adf_est(margdata = expdata, w = seq(0, 1, by = 0.01), method = "cl", q = 0.95, constrained = F) # Estimation using CL method with conditional extremes parameters ## w, q and qalphas are set to the default values lambdacl2 <- adf_est(margdata = expdata, w = seq(0, 1, by = 0.01), method = "cl", q = 0.95, qalphas = rep(0.95, 2), constrained = T) # attributes of the S4 object str(lambdah) # head of the vector with adf estimates for the first estimator head(lambdah@adf)
It is possible to plot an S4 object of adf_est.class
with plot
,
where a comparison of the estimated ADF and its lower bound,
$\max{\omega, 1-\omega},$ is shown.
# plot of the ADF estimation based on the unconstrained Hill estimator plot(lambdah)
After estimation of the ADF, it is important to assess its
goodness-of-fit. Noting that
$T_\omega^1=(T_\omega-u_\omega\mid T_\omega>u_\omega)\sim \text{Exp}(\lambda(\omega)) \Leftrightarrow \lambda(\omega)T_\omega^1\sim \text{Exp}(1),$
we can investigate whether there is agreement between model and
empirical exponential quantiles, or not. This is done in the
ReturnCurves
package through QQ plots by plotting points
$\left(F_E^{-1}(i/(n+1)),\, T^1_{(i)}\right)$, where $F_E^{-1}$ denotes
the inverse of the cumulative distribution function of a standard exponential
distribution and $T_{(i)}^{-1}$ is the $i$-th ordered increasing
statistic, $i=1, \ldots, n$. The uncertainty of the empirical quantiles
is quantified using a bootstrap approach. If temporal dependence is
present in the data, a block bootstrap approach should be used, i.e. blocksize
$>1.$
The assessment of the goodness-of-fit of $\lambda(\omega)$ can be done
using the function adf_gof
which takes an S4 object of class
adf_est.class
, a ray $\omega$ to be considered, the size of the blocks
for the bootstrap procedure and the corresponding number of samples, and
the significance level $\alpha$ for the tolerance intervals as inputs.
In turn, it returns an S4 object of class adf_gof.class
with an extra
attribute gof
containing a list with the model and empirical
quantiles, and the lower and upper bounds of the tolerance interval.
We note that this function is implemented to evaluate the fit at a single ray $\omega;$ therefore, we recommend repeating the procedure for a few rays to have a better representation. In addition, if the ray provided by the user was not used for the estimation of the ADF, then the closest $\omega$ in the grid is used instead.
# Goodness of fit of the adf for twp rays w rays <- c(0.25, 0.75) ## nboot and alpha are set to the default values ## blocksize is set to 10 to account for temporal dependence gofh <- sapply(rays, adf_gof, adf = lambdah, blocksize = 10, nboot = 250, alpha = 0.05) # attributes of the S4 object str(gofh[[1]]) # head of the list elements of slot gof head(gofh[[1]]@gof$model) head(gofh[[1]]@gof$empirical) head(gofh[[1]]@gof$lower) head(gofh[[1]]@gof$upper)
As before, it is possible to plot an S4 object of adf_gof.class
with
plot
, where the QQ plot with the model and empirical quantiles is
shown. The points should lie close to the line $y=x;$ for a good fit and
agreement between these quantiles, the line $y=x$ should mainly lie within
the $(1-\alpha)\%$ tolerance intervals.
library(gridExtra) grid.arrange(plot(gofh[[1]]), plot(gofh[[2]]), ncol = 2)
Given a probability $p$ and the joint survivor function $\text{Pr}(X>x, Y>y)$ of the bivariate vector $(X,Y),$ the $p$-probability return curve is defined as \begin{equation}\label{eq:rc} \text{RC}(p):={(x, y) \in \mathbb{R}^2 : \text{Pr}(X>x, Y>y) = p}. \end{equation} The interest lies in values of $p$ close to $0$ as these are the ones characterising rare joint exceedances events. Given any point $(x,y) \in \text{RC}(p),$ the event ${X >x, Y>y}$ is expected to happen once each return period $1/p,$ on average. This is equivalent to having an expected value of $np$ points in the region $(x, \infty)\times (y, \infty)$ in a sample size of $n$ from $(X,Y).$
Since the probability $p$ is close to $0,$ methods that can accurately capture the behaviour of the joint tail are necessary in order to realistically extrapolate and estimate $\text{RC}(p)$ for values of $p$ outside of the observation period. @MurphyBarltropetal2023 consider a couple of methods to achieve this, one of which uses the ADF $\lambda(\omega)$ given in equation \eqref{eq:wt} to characterise the joint tail behaviour.
Estimation of $\text{RC}(p)$ is done with standard exponentially distributed variables; therefore, the first step is to transform the original data onto standard exponential margins using equation \eqref{eq:pit}, and then, after estimation of $\text{RC}(p),$ back transform them onto the original margins. Estimates of $\text{RC}(p)$ are obtained through estimates of $t$ and $u$ from equation \eqref{eq:minproj}, and rays $\omega.$ In particular, the value of $t>0$ can be obtained by first estimating $u$ as the $(1-p^)$-th quantile of $T_\omega,$ where $p^>p,$ is a small probability, and then ensuring that $\text{Pr}(T_\omega > t + u)=p.$ Since $u$ is estimated as the $(1-p^)$-th quantile of $T_\omega,$ we have that $\text{Pr}(T_\omega > u) = p^;$ thus, \begin{equation} p = \text{Pr}(T_\omega > t + u) = \text{Pr}(T_\omega > u) \text{Pr}(T_\omega > t + u \mid T_\omega > u) = p^e^{-\hat{\lambda}(\omega)t}, \end{equation} which leads to $t=-\log(p/p^)/\hat{\lambda}(\omega).$ Finally, the estimates of the return curve $\hat{\text{RC}}(p)$ can be obtained by setting $(x, y):=\left(\omega(t+u), (1-\omega)(t+u)\right).$
In the ReturnCurves
package, estimation of the return curve is done
through function rc_est
which shares the same inputs as function
adf_est
with an additional argument p
representing the curve
survival probability. This probability value should be smaller than
$1-q,$ where $q$ is the quantile for the min-projection variables
$T^1_\omega,$ and, when applicable, smaller than $1-q_\alpha,$ where
$q_\alpha$ are the quantiles used in the conditional extremes method.
Function rc_est
returns an S4 object of class rc_est.class
with
14 attributes, with a list and a matrix in the last 2 slots:
interval
: vector with the maximum likelihood estimates from the conditional extremes model $\hat\alpha^1_{x\mid y}$ and $\hat\alpha^1_{y\mid x}$ if constrained = TRUE
. Otherwise, it returns the values $0$ and $1;$ this has no meaningful interpretation as the estimation is performed in an unconstrained interval.rc
: matrix with the estimates of the return curve on the original margins.n <- dim(airdata)[1] prob <- 10/n # Estimation using Hill estimator without conditional extremes parameters whill <- seq(0, 1, by = 0.001) ## q and constrained are set to the default values here rch <- rc_est(margdata = expdata, w = whill, p = prob, method = "hill", q = 0.95, constrained = F) # Estimation using Hill estimator with conditional extremes parameters ## q and qalphas are set to the default values rch2 <- rc_est(margdata = expdata, w = whill, p = prob, method = "hill", q = 0.95, qalphas = rep(0.95, 2), constrained = T) # Estimation using CL method without conditional extremes parameters ## w, q and constrained are set to the default values here rccl <- rc_est(margdata = expdata, w = seq(0, 1, by = 0.01), p = prob, method = "cl", q = 0.95, constrained = F) # Estimation using CL method with conditional extremes parameters ## w, q and qalphas are set to the default values rccl2 <- rc_est(margdata = expdata, w = seq(0, 1, by = 0.01), p = prob, method = "cl", q = 0.95, qalphas = rep(0.95, 2), constrained = T) # attributes of the S4 object str(rch) # head of the vector with adf estimates for the first estimator head(rch@rc)
It is possible to plot an S4 object of rc_est.class
with plot
, where
the original data is plotted with the estimated line for the return
curve $\hat{\text{RC}}(p).$
# plot of the ADF estimation based on the unconstrained Hill estimator plot(rch)
@MurphyBarltropetal2023 propose a procedure to assess the uncertainty of
the return curve estimates. For large positive $m\in\mathbb{N},$ let
\begin{equation}\label{eq:angles}
\boldsymbol{\Theta}:=\left{\frac{\pi(m+1-j)}{2(m+1)} \mid 1 \leq j\leq m\right},
\end{equation} define a set of angles. For each
$\theta \in \boldsymbol{\Theta},$ the line
$L_\theta :={(x,y) \in \mathbb{R}^2_+ \mid \tan(\theta)>0}$ intersects
the estimated $\hat{\text{RC}}(p)$ exactly once, i.e.
${(\hat{x}\theta, \hat{y}\theta)}:= \hat{\text{RC}}(p) \cap L_\theta$
where $(\hat{x}\theta, \hat{y}\theta) \in \hat{\text{RC}}(p).$
Moreover, let
$\hat{d}\theta := \left(\hat{x}\theta^2 + \hat{y}\theta^2\right)^{1/2}$
denote the $L_2$-norm of the point estimate. Uncertainty in the return curve estimates is quantified using the
distribution of $\hat{d}\theta$ at each angle
$\theta \in \boldsymbol{\Theta}$ as follows: for $k = 1, \ldots,$
nboot
:
```{=tex} \begin{enumerate} \item Bootstrap the original data set; when temporal dependence is present, a block bootstrap should be used. \item For each $\theta \in \boldsymbol{\Theta},$ obtain $\hat{d}_{\theta,k}$ for the corresponding return curve estimate. \end{enumerate}
Finally, given $\theta \in \boldsymbol{\Theta},$ empirical estimates of the mean, median and $(1-\alpha)\%$ confidence intervals for $\hat{d}_\theta$ can be obtained using the sample of $\hat{d}_{\theta,k}.$ These are available through function `rc_unc`, which takes as inputs: - `retcurve`: an S4 object of class `rc_est.class` containing the return curve estimates, - `blocksize`: size of blocks for the block bootstrap procedure; if no temporal dependence is present, then set `blocksize = 1` (default), - `nboot`: number of bootstrap samples to be taken, - `nangles`: number of angles $m$, - `alpha`: significance level to compute the $(1-\alpha)\%$ confidence intervals. Function `rc_unc` returns an S4 object of class `rc_unc.class` with 6 attributes, where the last slot `unc` contains a list with: - `median`: a vector containing the empirical estimates of the median return curve - `mean`: a vector containing the empirical estimates of the mean return curve - `lower`: a vector containing the lower bound of the confidence interval - `upper`: a vector containing the upper bound of the confidence interval For simplicity, just the uncertainty of the return curve obtained using the unconstrained Hill estimator is computed here. ```r # nangles and alpha set to default # nboot set to 50 for simplicity # blocksize is set to 10 to account for temporal dependence rch_unc <- rc_unc(rch, blocksize = 10, nboot = 50, nangles = 150, alpha = 0.05) # attributes of the S4 object str(rch_unc) # head of the list elements of slot unc head(rch_unc@unc$median) head(rch_unc@unc$mean) head(rch_unc@unc$lower) head(rch_unc@unc$upper)
It is possible to plot an instance of the S4 class rc_unc.class
with
function plot
; this takes the S4 object and an extra argument which
as inputs. If which = "rc"
(default), then the estimated return curve
is plotted, setting which = "median"
shows the empirical median
estimates of the return curve, while setting which = "mean"
shows the
empirical mean estimates of the return curve. All plots show the
uncertainty associated with the estimated return curve in dashed lines.
Finally, by setting which = "all"
, plots the estimated return curve,
the empirical median and mean estimates and the associated uncertainty.
library(gridExtra) grid.arrange(plot(rch_unc, which = "rc"), plot(rch_unc, which = "median"), plot(rch_unc, which = "mean"), plot(rch_unc, which = "all"), nrow = 2)
It is important to assess the goodness-of-fit of the return curve
estimates, given that the true return curve is unknown in reality. This
is implemented in the ReturnCurves
package based on the approach
proposed by @MurphyBarltropetal2023.
Given the return curve $\text{RC}(p),$ the probability of lying in a
survival region $(x, \infty)\times(y,\infty)$ is $p.$ Given the same set
of angles $\boldsymbol{\Theta}$ as in equation \eqref{eq:angles}, for
each $\theta_j\in\boldsymbol{\Theta},$ the empirical probability
$\hat p_j$ of lying in
$(\hat{x}{\theta_j}, \infty)\times (\hat{y}{\theta_j}, \infty),$ where
$(\hat{x}{\theta_j}, \hat{y}{\theta_j})$ is the corresponding point in
$\hat{\text{RC}}(p),$ is given by the proportion of points in that
region. The goodness-of-fit of the estimated return curve is then
assessed via a bootstrap procedure; for each angle
$\theta_j\in\boldsymbol{\Theta},$ the original data set is bootstrapped
and empirical probability estimates $\hat p_j$ are obtained. When
temporal dependence is present in the data, a block bootstrap approach
should be taken and the size of the blocks must be defined. We note that
for each $j,$ nboot
empirical probabilities are estimated and, so the
median and the $(1-\alpha)\%$ pointwise confidence intervals for the
probabilities can be obtained by taking the $50\%,$ $(\alpha/2)\%$ and
$(1-\alpha/2)\%$ quantiles of the set of empirical probabilities for
each $j,$ respectively.
The goodness-of-fit for an estimated return curve is implemented through
function rc_gof
. This shares the same input arguments as the rc_unc
function and returns an S4 object with 5 attributes with the last slot
gof
containing a list with:
median
: a vector with the median of the empirical probabilities,lower
: a vector with the lower bound of the confidence interval,upper
: a vector with the upper bound of the confidence interval.For simplicity, just the goodness-of-fit of the return curve obtained using the unconstrained Hill estimator is computed here.
# nboot, nangles and alpha set to default # blocksize is set to 10 to account for temporal dependence rch_gof <- rc_gof(rch, blocksize = 10, nboot = 250, nangles = 150, alpha = 0.05) # attributes of the S4 object str(rch_gof) # head of the list elements of slot gof head(rch_gof@gof$median) head(rch_gof@gof$lower) head(rch_gof@gof$upper)
It is possible to plot an instance of the S4 class rc_gof.class
with
function plot
, where a comparison between the true probability $p$ (in
red) and the empirical median estimates (in black) is shown. Ideally,
$p$ should be contained in the confidence region, shaded in grey.
Finally, in practice, the value of $p$ should be within the range of the
data and not too extreme, given the nature of empirical probabilities.
plot(rch_gof)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.