In martindurocher/floodStat: FloodNet Regional Frequency Analysis

Introduction

This document will present how floodnetRfa can be used to perform at-site flood frequency analysis using the threshold modeling. This example is illustrated using the daily discharge data of the Saint-John River at Fort Kent (NB). First a verification should be done to make sure that only complete year of data are included. In this case, one can see that after 1927 the years are complete.

library(floodnetRfa)
data(flowStJohn)

## Extract the year. 
y <- as.numeric(format(flowStJohn$date,'%Y'))

## Verify which year are complete. Print the five years with the less observations.
sort(sapply(split(y,y), length))[1:5]

## Keep complet year
xd <- cbind(flowStJohn[y>=1927,], year = y[y>=1927])

Estimating model parameters

By default the Function FitPot estimates the parameter of a Generalized Pareto distribution (GPA) assuming that the data are independent exceedances with lower bound $u = 0$. $$ F(x)= \exp\left{- \left( 1-\kappa \frac{x}{\alpha} \right)^{1/\kappa} \right} $$ The recommended method of estimation is the maximum likelihood, but the L-moment estimate can be obtained using the argument method = lmom. In each case, the covariance matrix of the estimate is evaluated. For the L-moments, the covariance matrix is estimated by bootstraps, but it can be turned off using the argument varcov = FALSE to speed up computation.
Confidence intervals of the model parameters can be derived by the standard error using normal approximation. Symmetrical confidence intervals can be unrealistic in some occasions in particular for the scale parameter $\alpha$. In these cases, the profile likelihood method can be used and is provided by the function coef when using the argument ci = TRUE.

## Fitting POT using MLE
fit <- FitPot(rgpa(2000, 1, 0))
print(fit)

## Show confidence interval for estimated parameters
coef(fit, ci = TRUE)

## Fitting POT using L-moments
coef(FitPot(rgpa(2000, 1, 0), method = 'lmom', varcov = FALSE))

Declustering

In practice, flow data evolve smoothly and declustering techniques are necessary to extract important flood peaks that are independents and above a desired threshold $u$. Without declustering, exceedances will form groups of adjacent events (clusters) that represent a single flood events. Peaks can be extracted using the function which.cluster or which.floodPeaks. The first method is a running declustering method that extracts the maximum of each cluster separated by at least r observations below the threshold. Such method may be more natural for studying rainfall events that does present storage effect that can create various form of long term dependence. The second method follows the recommendation of the Water Resource Council of the United States as described in Lang et al. (1999). In brief, two adjacent peaks must respect the following two conditions: i) they must be separated by $4+log(A)$ days, where $A$ is the drainage area in square kilometers; and ii) the minimal intermediate flow must be less than 75% of the lowest of the two peaks. The example below illustrate the extraction of the peaks by the two methods and shows that the choice of a declustering method has a significant impact.

## Define a threshold and the minimum separating time
thresh <- 500
r0 <- round(4 + log(14700)) # 14 days

## Extract peaks based on run declustering
peaks1 <- which.clusters(flow~date, xd, u = thresh, r = r0)

## Extract peaks based on WRC recommendations
peaks2 <- which.floodPeaks(flow~date, xd, u = thresh, r = r0, rlow = 0.75)

## Plot the peaks extracted for the year 1927
plot(flow~date, xd[xd$year == 1927,],type = 'l', col = 'grey')
abline(h = thresh, col = 2 ,lwd = 2)
points(flow~date, xd[peaks1,], pch = 16, col = 'blue', cex=2)
points(flow~date, xd[peaks2,], pch = 17, col = 'red')
legend('topleft', col = c('blue','red'), pch = 16:17, 
       legend = c('Cluster','WRC'))

The declustering techniques are integrated to the function FitPot and are performed when the argument declust is passed. The example below performs the same declustering that resulted in peaks2 above before fitting the exceedances by a GPA distribution.

## Fit a POT model after declustering
fit <- FitPot(flow~date, xd, u = 1000, declust = 'wrc', r = r0)
print(fit)

Predicting flood quantiles

In the summary of the model output, the shape parameter is positive, which indicates a light tail where the distribution is bounded by $u + \alpha/\kappa$. The flood quantiles $x_T$ of a $T$ year return period event is defined as the quantile of probability $[1-1/(\lambda T)]$ of the distribution of the exceedances, where $\lambda$ is the average number of exceedances per year. A natural estimator of $\lambda$ is the fraction of flood events on the number of years of observations. Hence, for a GPA distribution the flood quantiles are $$ x_T = u + \frac{\alpha}{\kappa}\left[ 1 - \left(\lambda T\right)^{-\kappa}\right]. $$ In R, the flood quantiles are evaluated using the function predict and the standard deviation as well as the confidence interval can be included in the output. In the example below, the profile likelihood method is used to evaluate the confidence interval. Other available methods for evaluating the confidence interval are the delta method (delta) and parametric bootstraps (boot).

## Predict flood quantile of return period 10 and 100 years
predict(fit, q = c(.9, .99), se = TRUE, ci = 'profile')

Selecting the threshold

An import aspect of using POT is the selection of a proper threshold. This decision is generally based on the notion of stability. This principle entails that if $u$ is a well-chosen threshold then the exceedances follow a GPA distribution and the exceedances of higher thresholds $u^\ast>u$ also follow a GPA with the same shape parameter. The objective is therefore to find the lowest threshold that ensures such stability. This way, it prevents the outcome to be uniquely determined by the selected threshold. One visual assessment of threshold stability is provided by the mean residual life plot (MRL). In this graph, the mean of the exceedances is plotted for various candidate thresholds. When stability is reached, the MRL plot should exhibit a linear trend. In the figure below, the MRL plot indicates that a threshold $u = 1000$ used in the previous examples is a reasonable choice.

## List of candidate thresholds
ulst <- seq(500,2000, 10)

## Mean residual life plot
PlotMrl(flow~date, xd, u = ulst, declust = 'wrc', r = r0)

However, due among other to sampling error, MRL plots are not always as clear and it requires that a person manually assess the graph. Another validation approach for selecting a threshold is to use a goodness of fit test (GOF). The example below uses the Anderson-Darling test and extracts the p-value of every candidate threshold. The figure shows that before a threshold of approximately 800, the p-values of the GOF test are low and does not support the hypothesis of a GPA. After this changing point, the p-values increased until reaching 0.5 for a threshold of about 1000. Here, the value 0.5 is the upper bound imposed by the table of Choulaki and Stephen (2001) that is used for evaluating rapidely the p-values. Using the common significance level 0.05, it was found that this procedure selects in several occasions thresholds that are too low (Solari et al., 2017). Durocher et al. (2018) suggested instead of using a critical values of 0.25 to avoid random acceptation of GPA distribution by the GOF test. Accordingly, the lowest stable threshold among the candidates is 910.

## Function that extract the p-value of a GOF for a specific threshold.
Fun <- function(u){
  f <- FitPot(flow~date, xd, u = u, declust = 'wrc', r = r0)
  c(u,GofTest(f)$pvalue)
}

## Compute p-values for all candidate thresholds.
candidates <- t(sapply(ulst, Fun))

## Plot the p-values with respect to the candidate thresholds.
plot(candidates[,c(1,2)], type = 'l',
     xlab = 'Threshold', ylab = 'AD test (p-value)')
abline(h = c(.05,.25), lty = 3)

## Print best candidates
u0 <-which(candidates[,2]>.25)[1]
candidates[u0,1]

Conclusion

In this document, it was shown how a GPA model can be fitted on exceedances to evaluate flood quantiles of gauged stations. Instructions for estimating the model (FitPot) and evaluate uncertainty for the parameters (coef) and the flood quantile were provided (predict). Finally, basic validation procedures based on the mean residual life plot (PlotMrl) and goodness of fit tests (GofTest) were shown to assess the selection of the threshold.
For a general introduction to the technical aspect of threshold modeling, the reader is referred to Coles (2001) or Davison and Smith (1990).

References

Choulakian, V., Stephens, M.A. (2001). Goodness-of-Fit Tests for the Generalized Pareto Distribution. Technometrics 43, 478–484. https://doi.org/10.2307/1270819

Coles, S., 2001. An introduction to statistical modeling of extreme values. Springer Verlag.

Davison, A.C., Smith, R.L. (1990). Models for Exceedances over High Thresholds. Journal of the Royal Statistical Society. Series B (Methodological) 52, 393–442.

Durocher, M., Zadeh, S.M., Burn, D.H., Ashkar, F. (2018). Comparison of automatic procedures for selecting flood peaks over threshold based on goodness-of-fit tests. Hydrological Processes 0. https://doi.org/10.1002/hyp.13223

Lang, M., Ouarda, T.B.M.J., Bobée, B. (1999). Towards operational guidelines for over-threshold modeling. Journal of Hydrology 225, 103–117. https://doi.org/10.1016/S0022-1694(99)00167-5

Solari, S., Egüen, M., Polo, M.J., Losada, M.A. (2017). Peaks Over Threshold (POT): A methodology for automatic threshold estimation using goodness of fit p-value. Water Resour. Res. 53, 2833–2849. https://doi.org/10.1002/2016WR019426

martindurocher/floodStat documentation built on May 31, 2019, 12:42 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com