Return Curves Estimation

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Introduction

This vignette provides complementary information to the R Documentation for the ReturnCurves package. It summarises the key methodologies implemented in the package and is heavily based on the works of @MurphyBarltropetal2023 and @MurphyBarltropetal2024; for full details we refer the user to these articles.

The ReturnCurves package aims at estimating the $p$-probability return curve [@MurphyBarltropetal2023], for small $p>0,$ while implementing pointwise and smooth approaches to estimate the so called angular dependence function first introduced by @WadsworthTawn2013.

library(ReturnCurves)

To illustrate the functionality of the package, we use the data set airdata which contained air pollution data collected from Marylebone, London (UK). The data set contains $1427$ daily measurements of air pollutant concentrations of NOx and PM10.

data(airdata)

Marginal transformation

The estimation of the angular dependence function and/or of the return curve is implemented for a bivariate vector $(X_E,Y_E)$ marginally distributed as standard exponential, i.e, $X_E, \, Y_E\sim \text{Exp}(1).$ Thus, the original data $(X, Y)$ needs to be marginally transformed, which is achieved via the probability integral transform. We follow the procedure of @ColesTawn1991 where the empirical cumulative distribution function $\tilde{F}$ is fitted below a threshold $u$, and a generalised Pareto distribution (GPD) is fitted above, giving the following estimate of the marginal cumulative distribution function (cdf) of $X$ or $Y:$ \begin{equation} \label{eq:pit} \hat{F}(z) = \begin{cases} 1-\left(1-\tilde{F}(u)\right)\left[1+\hat\xi\frac{z-u}{\hat\sigma}\right]_+^{-1/\hat\xi}, & \text{if } z>u, \ \tilde{F}(z), & \text{if } z \leq u, \end{cases} \end{equation} where $\hat\sigma$ and $\hat\xi$ are the estimated scale and shape parameters of the GPD. Exponential margins are obtained by applying $-\log(1-\hat F(\cdot))$ to each margin, where $\hat F(\cdot)$ is estimated separately for each margin.

This is done with the function margtransf which takes as inputs a matrix containing the original data, a vector of the marginal quantiles used to fit the GPD and a boolean value constrainedshape which decides whether $\xi> -1$ if set to TRUE (Default), or $\xi \in \mathbb{R}$ if set to FALSE.

Function margtransf returns an object of S4 class margtrasnf.class with 6 attributes:

# qmarg and constrainedshape set to the default values
expdata <- margtransf(data = airdata, qmarg = rep(0.95, 2), constrainedshape = T)

# attributes of the S4 object
str(expdata)

# head of the data on standard exponential margins
head(expdata@dataexp)

It is possible to plot an S4 object of margtrasnf.class with plot. By setting argument which = "hist", histograms of each variable on original and standard exponential margins can be seen:

plot(expdata, which = "hist")

To visualise the time series of each variable on original and standard exponential margins, we need to set which = "ts":

plot(expdata, which = "ts")

The joint distribution on original and standard exponential margins can be accessed with which = "joint":

plot(expdata, which = "joint")

Finally, it is possible to plot all these together by setting which = "all", which is the default for this argument.

plot(expdata, which = "all") # or just plot(expdata)

Assessing the marginal tail fits

When transforming the data onto standard exponential variables as in equation \eqref{eq:pit}, it is assumed that above a threshold $u$ data follows a GPD. It is possible to assess if this is a reasonable assumption through checking if there is an agreement between model and empirical GPD quantiles. This is done via QQ plots in the ReturnCurves package by plotting points $\left(F_{GPD}^{-1}\left(\frac{i}{n_{exc} + 1}\right) + u, X^{GPD}{(i)} + u\right),$ where $X^{GPD}{(i)}$ denotes the $i$-th ordered increasing statistic $(i = 1, \ldots, n)$ of the exceedances, i.e., $X^{GPD}=(X-u \mid X >u),$ $n_{exc}$ denotes the sample size of these exceedances, and $F_{GPD}^{-1}$ denotes the inverse of the cumulative distribution function of a GPD. Finally, the uncertainty on the empirical quantiles is quantified using a bootstrap approach. If temporal dependence is present in the data, then a block bootstrap approach is required, i.e., blocksize > 1.

This is done using the function marggpd function which takes as inputs an S4 object of class margtransf.class, the size of blocks of the bootstrap procedure and the corresponding number of samples, and the significance level $\alpha$ for the tolerance intervals. It then returns an S4 object of class marggpd.class with an extra attribute marggpd containing a list with:

# nboot and alpha are set to the default values
# blocksize is set to 10 to account for temporal dependence
uncgpd <- marggpd(margdata = expdata, blocksize = 10, nboot = 250, alpha = 0.05)

# attributes of the S4 object
str(uncgpd)

# head of the list elements of slot marggpd for variable X
head(uncgpd@marggpd$model[[1]])
head(uncgpd@marggpd$empirical[[1]])
head(uncgpd@marggpd$lower[[1]])
head(uncgpd@marggpd$upper[[1]])

It is possible to plot an S4 object of marggpd.class with plot, where the QQ plots with the model and empirical quantiles for each variable are shown. The points should lie close to the line $y=x;$ for a good fit and agreement between these quantiles, the line $y=x$ should mainly lie within the $(1-\alpha)\%$ tolerance intervals.

plot(uncgpd)

Estimation of the angular dependence function

In bivariate extremes, interest may lie in studying regions where both variables are extreme or where only one is extreme. For this, methods that aim at characterising the joint tail behaviour in both scenarios, such as the one introduced by @WadsworthTawn2013, are required. Given standard exponentially distributed variables $X_E$ and $Y_E$ and a slowly varying function $\mathcal{L}(\cdot; \omega)$ at infinity, the joint tail behaviour of $(X_E,Y_E)$ is captured through $\lambda(\omega)$ via the assumption \begin{equation} \text{Pr}(X_E > \omega u,\, Y_E > (1-\omega) u) = \mathcal{L}(e^u; \omega)e^{-\lambda(\omega)u} \quad \text{as } u \to \infty, \end{equation} which can be rewritten as \begin{equation}\label{eq:wt} \text{Pr}\left(\min\left{\frac{X_E}{\omega}, \, \frac{Y_E}{1-\omega}\right}\right) = \mathcal{L}(e^u; \omega)e^{-\lambda(\omega)u} \quad \text{as } u \to \infty, \end{equation} where $\omega\in[0,1]$ and $\lambda(\omega)\geq \max{\omega, 1-\omega}$ is called the angular dependence function (ADF). In the case of asymptotic dependence (see for instance, @Colesetal1999), $\lambda(\omega)=\max{\omega, 1-\omega},$ for all $\omega\in[0,1].$

Lastly, defining a min-projection variable at $\omega,$ $T_\omega = \min\left{\frac{X_E}{\omega}, \, \frac{Y_E}{1-\omega}\right},$ equation \eqref{eq:wt} implies that \begin{equation}\label{eq:minproj} \text{Pr}(T_\omega>u+t\mid T_\omega>u) = \frac{\mathcal{L}(e^{u+t}; \omega)}{\mathcal{L}(e^u; \omega)}e^{-\lambda(\omega)t} \to e^{-\lambda(\omega)t} \quad \text{as } u \to \infty, \end{equation} for any $\omega\in[0,1]$ and $t>0.$ In other words, for all $\omega\in[0,1]$ and, as $u_\omega\to \infty,$ $T_\omega^1 := (T_\omega-u_\omega\mid T_\omega>u_\omega)\sim \text{Exp}(\lambda(\omega)).$ Estimation of the ADF can be done in different ways; @MurphyBarltropetal2024 present a few.

For the ReturnCurves package, two approaches are implemented: a pointwise estimator using the Hill estimator [@Hill1975], $\hat{\lambda}H,$ and a smoother estimator based on Bernstein-Bézier polynomials estimated via composite likelihood methods, $\hat{\lambda}{CL}.$ For the latter, @MurphyBarltropetal2024 propose using a family of Bernstein-Bézier polynomials to improve the estimation of the ADF. Given $k\in \mathbb{N},$ it is assumed that $\lambda(\omega)=\lambda(\omega;\boldsymbol{\beta})$ can be represented by the following family of functions: \begin{align}\label{eq:bbp} \mathcal{B}k^*=\left{(1-\omega)^k+\sum{i = 1}^{k-1}\beta_i {k \choose i} \omega^i(1-\omega)^{k-i}+\omega^k=:f(\omega)\mid \omega\in[0,1],\right. \nonumber \ \left.\phantom{\sum_{i = 1}^{k-1}{k \choose i}}\boldsymbol{\beta}\in[0, \infty)^{k-1} \text{ such that } f(\omega)\geq\max{\omega, 1-\omega}\right}. \end{align}

As $T_\omega^1$ is exponentially distributed when $u_\omega\to\infty,$ the parameter vector $\boldsymbol{\beta}$ can be estimated using a composite likelihood function defined as \begin{equation} \label{eq:clf} \mathcal{L}C(\boldsymbol{\beta}) = \left[\prod{\omega \in \Omega}\lambda(\omega;\boldsymbol{\beta})^{\mid \boldsymbol{t}\omega^1\mid}\right]\exp\left{-\sum{\omega \in \Omega}\sum_{t_\omega^1\in \boldsymbol{t}\omega^1}\lambda(\omega;\boldsymbol{\beta})t\omega\right}, \end{equation} where $\mid \boldsymbol{t}\omega^1\mid$ represents the cardinality of set $\boldsymbol{t}\omega^1:={t_\omega-u_\omega\mid t_\omega\in \boldsymbol{t}\omega, \, t\omega>u_\omega}$ for some large values $u_\omega,$ and $\Omega$ is a finite subset spanning the interval $[0,1].$ The estimator of the ADF through composite likelihood methods is given by $\lambda(\cdot;\boldsymbol{\hat\beta}{CL})$ where $\boldsymbol{\hat\beta}{CL}$ maximises equation \eqref{eq:clf}.

Finally, @MurphyBarltropetal2024 showed that incorporating knowledge of the conditional extremes [@HeffernanTawn2004] parameters $\alpha_{y\mid x}$ and $\alpha_{x\mid y}$ improves the estimation of the ADF. In particular, the authors show that, in order to satisfy theoretical properties of $\lambda(\omega),$ $\lambda(\omega)=\max{\omega, 1-\omega}$ for all $\omega\in[0, \alpha_{x\mid y}^1]\cup[\alpha_{y\mid x}^1, 1]$ with $\alpha_{x\mid y}^1=\alpha_{x\mid y}/(1 + \alpha_{x\mid y})$ and $\alpha_{y\mid x}^1=1/(1 + \alpha_{y\mid x}).$ Thus, after estimating the conditional extremes parameters $\alpha_{y\mid x}$ and $\alpha_{x\mid y}$ through maximum likelihood estimation, we can set $\lambda(\omega)=\max{\omega, 1-\omega}$ for $\omega \in [0, \hat\alpha_{x\mid y}^1)\cup(\hat\alpha_{y\mid x}^1, 1].$ Then, for the Hill estimator, $\lambda(\omega)=\hat\lambda_H$ for $\omega \in \left[\hat\alpha_{x\mid y}^1, \hat\alpha_{y\mid x}^1\right].$ For the composite likelihood estimator, a rescaling of equation \eqref{eq:bbp} is needed to ensure continuity at $\hat\alpha_{x\mid y}^1$ and $\hat\alpha_{y\mid x}^1,$ as defined below: \begin{align} \mathcal{B}k^1=\left{(1-\hat\alpha{x\mid y}^1)\left(1-\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^k+\sum_{i = 1}^{k-1}\beta_i {k \choose i} \left(\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^i\left(1-\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^{k-i}+\right. \ \left.\hat\alpha_{y\mid x}^1\left(\frac{v-\hat\alpha_{x\mid y}^1}{\hat\alpha_{y\mid x}^1-\hat\alpha_{x\mid y}^1}\right)^k=:f(v)\mid v\in\left[\hat\alpha_{x\mid y}^1, \hat\alpha_{y\mid x}^1\right],\boldsymbol{\beta}\in[0, \infty)^{k-1} \text{ such that } f(v)\geq\max{v, 1-v}\right}. \end{align} $\lambda(\omega)=\lambda(\omega;\boldsymbol{\beta})$ is assumed to be represented by an element of $\mathcal{B}k^1$ on $\left[\hat\alpha{x\mid y}^1, \hat\alpha_{y\mid x}^1\right].$ Finally, the estimators used for estimation are processed in order to satisfy theoretical conditions on $\lambda$ as identified in @MurphyBarltropetal2024.

Estimation of the ADF can be done using the function adf_est which takes as inputs:

Additional arguments can be defined outside of the default values; these include marginal quantiles for the min-projection variable $T^1$ at ray $\omega,$ marginal quantiles to fit the conditional extremes method if constrained=TRUE, and, if method= "cl", the polynomial degree $k,$ the initial values for $\boldsymbol{\beta}$ for the composite maximum likelihood procedure, and the convergence tolerance. Convergence is declared when the difference of log-likelihood values between iterations does not exceed the value of tol. This repeated optimisation helps to avoid convergence to local maxima, although does not guarantee finding the global maximum.

Function adf_est returns an object of S4 class adf_est.class with 11 attributes, where the first 9 are the inputs of the function and the last 2 are vectors:

# Estimation using Hill estimator without conditional extremes parameters
whill <- seq(0, 1, by = 0.001)
## q and constrained are set to the default values here
lambdah <- adf_est(margdata = expdata, w = whill, method = "hill", 
                   q = 0.95, constrained = F)

# Estimation using Hill estimator with conditional extremes parameters
## q and qalphas are set to the default values
lambdah2 <- adf_est(margdata = expdata, w = whill, method = "hill", q = 0.95,
                    qalphas = rep(0.95, 2), constrained = T)

# Estimation using CL method without conditional extremes parameters
## w, q and constrained are set to the default values here
lambdacl <- adf_est(margdata = expdata, w = seq(0, 1, by = 0.01), method = "cl",
                    q = 0.95, constrained = F)

# Estimation using CL method with conditional extremes parameters
## w, q and qalphas are set to the default values
lambdacl2 <- adf_est(margdata = expdata, w = seq(0, 1, by = 0.01), method = "cl",
                     q = 0.95, qalphas = rep(0.95, 2), constrained = T)

# attributes of the S4 object
str(lambdah)

# head of the vector with adf estimates for the first estimator
head(lambdah@adf)

It is possible to plot an S4 object of adf_est.class with plot, where a comparison of the estimated ADF and its lower bound, $\max{\omega, 1-\omega},$ is shown.

# plot of the ADF estimation based on the unconstrained Hill estimator
plot(lambdah)

Goodness-of-fit of the angular dependence function

After estimation of the ADF, it is important to assess its goodness-of-fit. Noting that $T_\omega^1=(T_\omega-u_\omega\mid T_\omega>u_\omega)\sim \text{Exp}(\lambda(\omega)) \Leftrightarrow \lambda(\omega)T_\omega^1\sim \text{Exp}(1),$ we can investigate whether there is agreement between model and empirical exponential quantiles, or not. This is done in the ReturnCurves package through QQ plots by plotting points $\left(F_E^{-1}(i/(n+1)),\, T^1_{(i)}\right)$, where $F_E^{-1}$ denotes the inverse of the cumulative distribution function of a standard exponential distribution and $T_{(i)}^{-1}$ is the $i$-th ordered increasing statistic, $i=1, \ldots, n$. The uncertainty of the empirical quantiles is quantified using a bootstrap approach. If temporal dependence is present in the data, a block bootstrap approach should be used, i.e. blocksize $>1.$

The assessment of the goodness-of-fit of $\lambda(\omega)$ can be done using the function adf_gof which takes an S4 object of class adf_est.class, a ray $\omega$ to be considered, the size of the blocks for the bootstrap procedure and the corresponding number of samples, and the significance level $\alpha$ for the tolerance intervals as inputs. In turn, it returns an S4 object of class adf_gof.class with an extra attribute gof containing a list with the model and empirical quantiles, and the lower and upper bounds of the tolerance interval.

We note that this function is implemented to evaluate the fit at a single ray $\omega;$ therefore, we recommend repeating the procedure for a few rays to have a better representation. In addition, if the ray provided by the user was not used for the estimation of the ADF, then the closest $\omega$ in the grid is used instead.

# Goodness of fit of the adf for twp rays w
rays <- c(0.25, 0.75)
## nboot and alpha are set to the default values
## blocksize is set to 10 to account for temporal dependence
gofh <- sapply(rays, adf_gof, adf = lambdah, blocksize = 10, nboot = 250, alpha = 0.05)

# attributes of the S4 object
str(gofh[[1]])

# head of the list elements of slot gof
head(gofh[[1]]@gof$model)
head(gofh[[1]]@gof$empirical)
head(gofh[[1]]@gof$lower)
head(gofh[[1]]@gof$upper)

As before, it is possible to plot an S4 object of adf_gof.class with plot, where the QQ plot with the model and empirical quantiles is shown. The points should lie close to the line $y=x;$ for a good fit and agreement between these quantiles, the line $y=x$ should mainly lie within the $(1-\alpha)\%$ tolerance intervals.

library(gridExtra)
grid.arrange(plot(gofh[[1]]), plot(gofh[[2]]), ncol = 2)

Estimation of the return curve

Given a probability $p$ and the joint survivor function $\text{Pr}(X>x, Y>y)$ of the bivariate vector $(X,Y),$ the $p$-probability return curve is defined as \begin{equation}\label{eq:rc} \text{RC}(p):={(x, y) \in \mathbb{R}^2 : \text{Pr}(X>x, Y>y) = p}. \end{equation} The interest lies in values of $p$ close to $0$ as these are the ones characterising rare joint exceedances events. Given any point $(x,y) \in \text{RC}(p),$ the event ${X >x, Y>y}$ is expected to happen once each return period $1/p,$ on average. This is equivalent to having an expected value of $np$ points in the region $(x, \infty)\times (y, \infty)$ in a sample size of $n$ from $(X,Y).$

Since the probability $p$ is close to $0,$ methods that can accurately capture the behaviour of the joint tail are necessary in order to realistically extrapolate and estimate $\text{RC}(p)$ for values of $p$ outside of the observation period. @MurphyBarltropetal2023 consider a couple of methods to achieve this, one of which uses the ADF $\lambda(\omega)$ given in equation \eqref{eq:wt} to characterise the joint tail behaviour.

Estimation of $\text{RC}(p)$ is done with standard exponentially distributed variables; therefore, the first step is to transform the original data onto standard exponential margins using equation \eqref{eq:pit}, and then, after estimation of $\text{RC}(p),$ back transform them onto the original margins. Estimates of $\text{RC}(p)$ are obtained through estimates of $t$ and $u$ from equation \eqref{eq:minproj}, and rays $\omega.$ In particular, the value of $t>0$ can be obtained by first estimating $u$ as the $(1-p^)$-th quantile of $T_\omega,$ where $p^>p,$ is a small probability, and then ensuring that $\text{Pr}(T_\omega > t + u)=p.$ Since $u$ is estimated as the $(1-p^)$-th quantile of $T_\omega,$ we have that $\text{Pr}(T_\omega > u) = p^;$ thus, \begin{equation} p = \text{Pr}(T_\omega > t + u) = \text{Pr}(T_\omega > u) \text{Pr}(T_\omega > t + u \mid T_\omega > u) = p^e^{-\hat{\lambda}(\omega)t}, \end{equation} which leads to $t=-\log(p/p^)/\hat{\lambda}(\omega).$ Finally, the estimates of the return curve $\hat{\text{RC}}(p)$ can be obtained by setting $(x, y):=\left(\omega(t+u), (1-\omega)(t+u)\right).$

In the ReturnCurves package, estimation of the return curve is done through function rc_est which shares the same inputs as function adf_est with an additional argument p representing the curve survival probability. This probability value should be smaller than $1-q,$ where $q$ is the quantile for the min-projection variables $T^1_\omega,$ and, when applicable, smaller than $1-q_\alpha,$ where $q_\alpha$ are the quantiles used in the conditional extremes method.

Function rc_est returns an S4 object of class rc_est.class with 14 attributes, with a list and a matrix in the last 2 slots:

n <- dim(airdata)[1] 
prob <- 10/n
# Estimation using Hill estimator without conditional extremes parameters
whill <- seq(0, 1, by = 0.001)
## q and constrained are set to the default values here
rch <- rc_est(margdata = expdata, w = whill, p = prob, method = "hill", 
              q = 0.95, constrained = F)

# Estimation using Hill estimator with conditional extremes parameters
## q and qalphas are set to the default values
rch2 <- rc_est(margdata = expdata, w = whill, p = prob, method = "hill", q = 0.95,
               qalphas = rep(0.95, 2), constrained = T)

# Estimation using CL method without conditional extremes parameters
## w, q and constrained are set to the default values here
rccl <- rc_est(margdata = expdata, w = seq(0, 1, by = 0.01), p = prob, method = "cl", 
               q = 0.95, constrained = F)

# Estimation using CL method with conditional extremes parameters
## w, q and qalphas are set to the default values
rccl2 <- rc_est(margdata = expdata, w = seq(0, 1, by = 0.01), p = prob, method = "cl", 
                q = 0.95, qalphas = rep(0.95, 2), constrained = T)

# attributes of the S4 object
str(rch)

# head of the vector with adf estimates for the first estimator
head(rch@rc)

It is possible to plot an S4 object of rc_est.class with plot, where the original data is plotted with the estimated line for the return curve $\hat{\text{RC}}(p).$

# plot of the ADF estimation based on the unconstrained Hill estimator
plot(rch)

Uncertainty of the return curve estimates

@MurphyBarltropetal2023 propose a procedure to assess the uncertainty of the return curve estimates. For large positive $m\in\mathbb{N},$ let \begin{equation}\label{eq:angles} \boldsymbol{\Theta}:=\left{\frac{\pi(m+1-j)}{2(m+1)} \mid 1 \leq j\leq m\right}, \end{equation} define a set of angles. For each $\theta \in \boldsymbol{\Theta},$ the line $L_\theta :={(x,y) \in \mathbb{R}^2_+ \mid \tan(\theta)>0}$ intersects the estimated $\hat{\text{RC}}(p)$ exactly once, i.e. ${(\hat{x}\theta, \hat{y}\theta)}:= \hat{\text{RC}}(p) \cap L_\theta$ where $(\hat{x}\theta, \hat{y}\theta) \in \hat{\text{RC}}(p).$ Moreover, let $\hat{d}\theta := \left(\hat{x}\theta^2 + \hat{y}\theta^2\right)^{1/2}$ denote the $L_2$-norm of the point estimate. Uncertainty in the return curve estimates is quantified using the distribution of $\hat{d}\theta$ at each angle $\theta \in \boldsymbol{\Theta}$ as follows: for $k = 1, \ldots,$ nboot:

```{=tex} \begin{enumerate} \item Bootstrap the original data set; when temporal dependence is present, a block bootstrap should be used. \item For each $\theta \in \boldsymbol{\Theta},$ obtain $\hat{d}_{\theta,k}$ for the corresponding return curve estimate. \end{enumerate}

Finally, given $\theta \in \boldsymbol{\Theta},$ empirical estimates of
the mean, median and $(1-\alpha)\%$ confidence intervals for
$\hat{d}_\theta$ can be obtained using the sample of
$\hat{d}_{\theta,k}.$ These are available through function `rc_unc`,
which takes as inputs:

-   `retcurve`: an S4 object of class `rc_est.class` containing the
    return curve estimates,
-   `blocksize`: size of blocks for the block bootstrap procedure; if no
    temporal dependence is present, then set `blocksize = 1` (default),
-   `nboot`: number of bootstrap samples to be taken,
-   `nangles`: number of angles $m$,
-   `alpha`: significance level to compute the $(1-\alpha)\%$ confidence
    intervals.

Function `rc_unc` returns an S4 object of class `rc_unc.class` with 6
attributes, where the last slot `unc` contains a list with:

-   `median`: a vector containing the empirical estimates of the median
    return curve
-   `mean`: a vector containing the empirical estimates of the mean
    return curve
-   `lower`: a vector containing the lower bound of the confidence
    interval
-   `upper`: a vector containing the upper bound of the confidence
    interval

For simplicity, just the uncertainty of the return curve obtained using
the unconstrained Hill estimator is computed here.

```r
# nangles and alpha set to default
# nboot set to 50 for simplicity
# blocksize is set to 10 to account for temporal dependence
rch_unc <- rc_unc(rch, blocksize = 10, nboot = 50, nangles = 150, alpha = 0.05)

# attributes of the S4 object
str(rch_unc)

# head of the list elements of slot unc
head(rch_unc@unc$median)
head(rch_unc@unc$mean)
head(rch_unc@unc$lower)
head(rch_unc@unc$upper)

It is possible to plot an instance of the S4 class rc_unc.class with function plot; this takes the S4 object and an extra argument which as inputs. If which = "rc" (default), then the estimated return curve is plotted, setting which = "median" shows the empirical median estimates of the return curve, while setting which = "mean" shows the empirical mean estimates of the return curve. All plots show the uncertainty associated with the estimated return curve in dashed lines. Finally, by setting which = "all", plots the estimated return curve, the empirical median and mean estimates and the associated uncertainty.

library(gridExtra)
grid.arrange(plot(rch_unc, which = "rc"), plot(rch_unc, which = "median"), 
             plot(rch_unc, which = "mean"), plot(rch_unc, which = "all"), nrow = 2)

Goodness-of-fit of the return curve estimates

It is important to assess the goodness-of-fit of the return curve estimates, given that the true return curve is unknown in reality. This is implemented in the ReturnCurves package based on the approach proposed by @MurphyBarltropetal2023.

Given the return curve $\text{RC}(p),$ the probability of lying in a survival region $(x, \infty)\times(y,\infty)$ is $p.$ Given the same set of angles $\boldsymbol{\Theta}$ as in equation \eqref{eq:angles}, for each $\theta_j\in\boldsymbol{\Theta},$ the empirical probability $\hat p_j$ of lying in $(\hat{x}{\theta_j}, \infty)\times (\hat{y}{\theta_j}, \infty),$ where $(\hat{x}{\theta_j}, \hat{y}{\theta_j})$ is the corresponding point in $\hat{\text{RC}}(p),$ is given by the proportion of points in that region. The goodness-of-fit of the estimated return curve is then assessed via a bootstrap procedure; for each angle $\theta_j\in\boldsymbol{\Theta},$ the original data set is bootstrapped and empirical probability estimates $\hat p_j$ are obtained. When temporal dependence is present in the data, a block bootstrap approach should be taken and the size of the blocks must be defined. We note that for each $j,$ nboot empirical probabilities are estimated and, so the median and the $(1-\alpha)\%$ pointwise confidence intervals for the probabilities can be obtained by taking the $50\%,$ $(\alpha/2)\%$ and $(1-\alpha/2)\%$ quantiles of the set of empirical probabilities for each $j,$ respectively.

The goodness-of-fit for an estimated return curve is implemented through function rc_gof. This shares the same input arguments as the rc_unc function and returns an S4 object with 5 attributes with the last slot gof containing a list with:

For simplicity, just the goodness-of-fit of the return curve obtained using the unconstrained Hill estimator is computed here.

# nboot, nangles and alpha set to default
# blocksize is set to 10 to account for temporal dependence
rch_gof <- rc_gof(rch, blocksize = 10, nboot = 250, nangles = 150, alpha = 0.05)

# attributes of the S4 object
str(rch_gof)

# head of the list elements of slot gof
head(rch_gof@gof$median)
head(rch_gof@gof$lower)
head(rch_gof@gof$upper)

It is possible to plot an instance of the S4 class rc_gof.class with function plot, where a comparison between the true probability $p$ (in red) and the empirical median estimates (in black) is shown. Ideally, $p$ should be contained in the confidence region, shaded in grey. Finally, in practice, the value of $p$ should be within the range of the data and not too extreme, given the nature of empirical probabilities.

plot(rch_gof)

References



Try the ReturnCurves package in your browser

Any scripts or data that you put into this service are public.

ReturnCurves documentation built on April 4, 2025, 5:36 a.m.