GPD_Threshold_Solari: Solari et al (2017) automatic GPD threshold selection
In rjaneUCF/MultiHazard: Tools for modeling compound events

GPD_Threshold_Solari

R Documentation

Solari et al (2017) automatic GPD threshold selection

Description

Automatic threshold selection method in Solari et al. (2017) is implemented to find the threshold above which excesses are follow a GPD. The code is based on the ANALISIS_POT_LNORM function provided by Sebastian Solari.

Usage

GPD_Threshold_Solari(
  Event,
  Data,
  RPs = c(10, 50, 100, 500, 1000),
  RPs_PLOT = c(2, 3, 4),
  Min_Quantile = 0.95,
  Alpha = 0.1,
  mu = 365.25,
  N_Sim = 10
)

Arguments

`Event`	Numeric vector containing the declustered events.
`Data`	Original time series. Dataframe containing two columns. In column: `1` A `"Date"` object of equally spaced discrete time steps. `2` Numeric vector containing corresponding time series values.
`RPs`	Numeric vector specifying the return levels calculated from the GPD fits over the thresholds. Default is `c(50,100,500,100)` plus the return period associated with the minimum candidate threshold.
`RPs_PLOT`	Numeric vector of length three specifying which elements of `RPs` are plotted in the middle row of the graphical output. Default is `c(1,2,3)`.
`Min_Quantile`	Numeric vector of length one specifying the minimum threshold, expressed as a quantile of the original time series (2nd column of `Data`) to be tested. Default `0.95`.
`Alpha`	Numeric vector of length one specifying the level of confidence associated with the confidence interval i.e., the probability that the interval contains the true value of the parameter is `1-\frac{Alpha}{2}`. The interval is referred to as the `100(1-\frac{Alpha}{2})\%` confidence interval. Default is `0.1`.
`mu`	(average) occurrence frequency of events in the original time series `Data`. Numeric vector of length one. Default is `365.25`, daily data.
`N_Sim`	Numeric vector of length one specifying the number of bootstrap samples. Default is `10`.

Value

List comprising

Thres_Candidate Thresholds tested which are the cluster maxima in Events exceeding the Min_Quantile quantile of the original time series (given in column 2 of Data).
GPD_MLE GPD parameter estimates, Mean Residual Life Plot (MRLP) values and return level estimates associated with each Thres_Candidate.
CI_Upper Upper limits of the confidence interval for the point estimates of the corresponding element of GPD_MLE.
CI_Lower Lower limits of the confidence interval for the point estimates of the corresponding element of GPD_MLE.
AR2 Value of the right-tail weighted Anderson Darling statistic A_R^2, the test statistic used in the Solari et al. (2017) method for each Thres_Candidate.
AR2_pValue p-value asssociated with A_R^2.

To interpret the graphical output. Top row: The GPD exhibits certain threshold stability properties. The guiding principle for threshold choice is to find the lowest value of the threshold such that the parameter estimates stabilize to a constant value which is sustained at all higher thresholds, once the sample uncertainty has been accounted for (typically assessed by pointwise uncertainty intervals). Mean residual life plot (left). If the GPD is a valid model for excesses above a threshold then the mean of these excesses will be a linear function of the threshold. We therefore select the lowest threshold where there is a linear trend in the mean residual life plot. Parameter stability plots for the shape (center) and scale (right) parameters. If the GPD is a suitable model for a threshold then for all higher thresholds it will also be suitable, with the shape and scale parameters being constant. The lowest threshold - to reduce the associated uncertainty - at which the parameter estimates are stable for all higher thresholds should be selected. Middle row: Return levels estimated from the GPD fitted at various thresholds. Lower row: Right-tail weighted Anderson Darling statistic A_R^2 associated with the GPD fitted using various thresholds. Lower A_R^2 statistic values signify less (quadratic) distance between the empirical distribution and the GPD i.e., GPD is a better fit for these thresholds (left). 1-p_{value} associated with the A_R^2 for each threshold. The A_R^2 goodness of fit tests, tests the null hypothesis that the observations are from a GPD. At smaller 1-p_{value} figure there is less chance of rejecting the null hypothesis i.e., the GPD is more suitable at these thresholds (center). Events per year at each threshold (right).

Details

EDF-statistics are goodness-of-fit statistics based on a comparison of the Empirical Distribution Function (EDF) F_n and a candidate parametric probability distribution F Stephens et al. (1974). Quadratic EDF test measure the distance between F and F_n by:

n\int^{\infty}_{-\infty}=(F(x)-F_n(x))^2w(x) dx

where n is the number of elements in the original sample and w(x) is a weighting function. In the Cramer Von Misses statistic w(x)=1, whereas the Anderson-Darling statistic A^2, assigns more weight to the tails of the data by setting w(x)=\frac{1}{F(x)(1-F(x))}. Under the null hypothesis that the sample x_1,\dots,x_n is from a GPD, the transformation z=F_1(x) a sample z uniformly distribution between 0 and 1.

A^2=-\frac{1}{n}\sum_{i=1}^{n} \{(2i-1)[log(z_i)+log(1-z_{n+z-i})]\}-n

Sinclair et al. (1990) proposed the right-tail weighted Anderson Darling statistic A_R^2 which allocates more weight to the upper tail and less to the lower tail of the distribution than A^2 and is given by:

A_R^2=\frac{n}{2}\sum_{i=1}^{n} \left [2-\frac{(2i-1)}{n}log(1-z_i)+2z_{i} \right]

Solari et al. (2017) formalized EDF statistic - GOF test threshold selection procedures used to test the null hypothesis that a sample is from a GPD distribution. creating an automated approach adopting the A_R^2 as the EDF statistic. The authors also proposed combining the approach with a bootstrapping technique to assess the influence of threshold on the uncertainly of higher return period quantiles. The approach in Solari et al. (2017) comprises the following steps:

Decluster the time series to produce a series of n_p independent cluster maxima \{x_i:i=1,\dots,n_p\} and sort such that \{x_1\leq\dots\leq x_p\}.
The sorted series defines a series of n_u thresholds after excluding repeated values i.e., n_u \leq n_p. For each threshold \{u_j, j=1,\dots,n_u\} fit the GPD via L-Moments using only the excesses satisfying x>u_j. Then, calculate the R-AD statistic and its associated p-value for each threshold.
Select the threshold that minimizes one minus the p-value i.e.,

u_0 = argmin_{u_j} (1-p(u_j)).

Examples

#Declustering the rainfall at site S22 using a 7-day window.
Rainfall_Declust_SW<-Decluster_SW(Data=S22.Detrend.df[,c(1:2)],Window_Width=7)
#Finding an appropriate threshold for the declustered series
GPD_Threshold_Solari(Event=Rainfall_Declust_SW$Declustered,
                     Data=22.Detrend.df[,2])

rjaneUCF/MultiHazard documentation built on July 4, 2025, 9:18 p.m.