return_curve_diag: Evaluates the goodness of fit of the return curve estimates
In rjaneUCF/MultiHazard: Tools for modeling compound events

return_curve_diag

R Documentation

Evaluates the goodness of fit of the return curve estimates

Description

The procedure calculates the empirical probability of observing data within the survival regions defined by a subset of points on the return curve. If the curve is a good fit, the empirical probabilities should closely match the probabilities associated with the return level curve. The procedure which is introduced in Murphy-Barltrop et al. (2023) uses bootstrap resampling of the original data set to obtain confidence intervals for the empirical estimates.

Usage

return_curve_diag(
  data,
  q,
  rp,
  mu,
  n_sim,
  n_grad,
  n_boot,
  boot_method,
  boot_replace,
  block_length,
  boot_prop,
  decl_method_x,
  decl_method_y,
  window_length_x,
  window_length_y,
  u_x = NA,
  u_y = NA,
  sep_crit_x = NA,
  sep_crit_y = NA,
  boot_method_all = "block",
  boot_replace_all = NA,
  block_length_all = 14,
  boot_prop_all = 0.8,
  alpha = 0.1,
  x_lab = NA,
  y_lab = NA,
  x_lim_min = min(data_df[, 2], na.rm = T),
  x_lim_max = max(data_df[, 2], na.rm = T) + 0.3 * diff(range(data[, 2], na.rm = T)),
  y_lim_min = min(data[, 3], na.rm = T),
  y_lim_max = max(data[, 3], na.rm = T) + 0.3 * diff(range(data[, 2], na.rm = T))
)

Arguments

`data`	Data frame of raw data detrended if necessary. First column should be of class `Date`.
`q`	Numeric vector of length one specifying quantile level for fitting GPDs and the HT04 and WT13 models.
`rp`	Numeric vector of length one specifying return period of interest.
`mu`	Numeric vector of length one specifying the (average) occurrence frequency of events in Data. Default is 365.25, daily data.
`n_sim`	Numeric vector of length one specifying the number of simulations for HT model. Default is `50`.
`n_grad`	Numeric vector of length one specifying number of number of rays along which to compute points on the curve. Default is `50`.
`n_boot`	Numeric vector of length one specifying number of bootstrap samples. Default is `100`.
`boot_method`	Character vector of length one specifying the bootstrap method. Options are `"basic"` (default), `"block"` or `"monthly"`.
`boot_replace`	Character vector of length one specifying whether simple bootstrapping is carried out with `"T"` or without `"F"` replacement. Only required if `boot_method = "basic"`. Default is `NA`.
`block_length`	Numeric vector of length one specifying block length. Only required if `boot_method = "block"`. Default is `NA`.
`boot_prop`	Numeric vector of length one specifying the minimum proportion of non-missing values of at least of the variables for a month to be included in the bootstrap. Only required if `boot_method = "monthly"`. Default is `0.8`.
`decl_method_x`	Character vector of length one specifying the declustering method to apply to the first variable. Options are the storm window approach `"window"` (default) and the runs method `"runs"`.
`decl_method_y`	Character vector of length one specifying the declustering method to apply to the second variable. Options are the storm window approach `"window"` (default) and the runs method `"runs"`.
`window_length_x`	Numeric vector of length one specifying the storm window length to apply during the declustering of the first variable if `decl_method_x = "window"`.
`window_length_y`	Numeric vector of length one specifying the storm window length to apply during the declustering of the second variable if `decl_method_y = "window"`.
`u_x`	Numeric vector of length one specifying the threshold to adopt in the declustering of the first variable if `decl_method_x = "runs"`. Default is `NA`.
`u_y`	Numeric vector of length one specifying the threshold to adopt in the declustering of the second variable if `decl_method_y = "runs"`. Default is `NA`.
`sep_crit_x`	Numeric vector of length one specifying the separation criterion to apply during the declustering of the first variable if `decl_method_x = "runs"`. Default is `NA`.
`sep_crit_y`	Numeric vector of length one specifying the separation criterion to apply during the declustering of the second variable if `decl_method_y = "runs"`. Default is `NA`.
`boot_method_all`	Character vector of length one specifying the bootstrapping procedure to use when estimating the distribution of empirical (survival) probabilities from the original dataset (without any declustering). Options are `"basic"` (default) and `"block"`.
`boot_replace_all`	Character vector of length one specifying whether bootstrapping of original dataset (without any declustering) when estimating the distribution of empirical (survival) probabilities is carried out with `"T"` or without `"F"` replacement. Only required if `boot_method_all = "basic"`. Default is `NA`.
`block_length_all`	Numeric vector of length one specifying block length. Only required if `boot_method_all = "block"`. Default is `14`.
`alpha`	Numeric vector of length one specifying the `100(1-alpha)%` confidence interval. Default is `0.1`.

Value

List comprising the angles "ang_ind" associated with the points on the curve for which the empirical probability estimates were calculated. For the HT04 model: Median "med_x_ht04", lower "lb_x_ht04" and upper "ub_x_ht04" bounds associated with the probabilities calculated using the sample conditioned on the first variable. Median "med_y_ht04", lower "lb_y_ht04" and upper "ub_y_ht04" bounds associated with the probabilities calculated using the sample conditioned on the second variable. Median "med_ht04", lower "lb_ht04" and upper "ub_ht04" bounds associated with the original dataset (without any declustering).

For the WT13 model: Median "med_x_wt13", lower "lb_x_wt13" and upper "ub_x_wt13" bounds associated with the probabilities calculated using the sample conditioned on the first variable. Median "med_y_wt13", lower "lb_y_wt13" and upper "ub_y_wt13" bounds associated with the probabilities calculated using the sample conditioned on the second variable. Median "med_wt13", lower "lb_wt13" and upper "ub_wt13" bounds associated with the original dataset (without any declustering).

Details

The HT04 model is fit to two conditional samples. One sample comprises the declustered time series of the first variable paired with concurrent values of the other variable. The second sample is obtained in the same way but with the variables reversed. The empirical probabilities are calculated using these two conditional samples and the original dataset (without any declustering). The return period should be chosen to ensure there is sufficient data for estimating empirical probabilities, yet the curve is sufficiently 'extreme'. An example could be to consider the fit using the 1 year return period curve rather than the 100 year return period curve.

Examples

#' #Data starts on first day of 1948
head(S22.Detrend.df)

#Dataframe ends on 1948-02-03
tail(S22.Detrend.df)

#Adding dates to complete final month of combined records
final.month = data.frame(seq(as.Date("2019-02-04"),as.Date("2019-02-28"),by="day"),NA,NA,NA)
colnames(final.month) = c("Date","Rainfall","OsWL","Groundwater")
S22.Detrend.df.extended = rbind(S22.Detrend.df,final.month)
#Diagnostic plots for the return curves
return_curve_diag(data=S22.Detrend.df.extended[,1:3],
                  q=0.985,rp=1,mu=365.25,n_sim=100,
                  n_grad=50,n_boot=100,boot_method="monthly",
                  boot_replace=NA, block_length=NA, boot_prop=0.8,
                  decl_method_x="runs", decl_method_y="runs",
                  window_length_x=NA,window_length_y=NA,
                  u_x=0.95, u_y=0.95,
                  sep_crit_x=36, sep_crit_y=36,
                  alpha=0.1,
                  boot_method_all="block", boot_replace_all=NA,
                  block_length_all=14)

rjaneUCF/MultiHazard documentation built on July 4, 2025, 9:18 p.m.