predict.ithresh: Predictive inference for the largest value observed in N...
In threshr: Threshold Selection and Uncertainty for Extreme Value Analysis

predict.ithresh

R Documentation

Predictive inference for the largest value observed in N years.

Description

predict method for class "ithresh". Predictive inferences can either be based on a single training threshold or using a weighted average of inferences over multiple training thresholds. A single threshold may chosen to be the best performing threshold, as judged by the measure of predictive performance calculated by ithresh or chosen by the user. The weights used in the latter case are based on the measures of predictive performance and prior probabilities assigned to the training thresholds. By default all thresholds are given the same prior probability but the user can specify their own prior.

Usage

## S3 method for class 'ithresh'
predict(
  object,
  npy = NULL,
  n_years = 100,
  which_u = c("best", "all"),
  which_v = 1L,
  u_prior = rep(1, length(object$u_vec)),
  type = c("p", "d", "q", "i", "r"),
  hpd = FALSE,
  x = NULL,
  ...
)

Arguments

`object`	An object of class `"ithresh"`, a result of a call to `ithresh`.
`npy`	A numeric scalar. The mean number of observations per year of data, after excluding any missing values, i.e. the number of non-missing observations divided by total number of years of non-missing data.
`n_years`	A numeric vector. Value(s) of N. If `which_u = "all"` then `n_years` must have length one.
`which_u`	Either a character scalar or a numeric scalar. If `which_u` is a character scalar it must be either "best" or "all". If `which_u = "best"` then the single threshold achieving the largest measure of predictive performance in `object$pred_perf`, based on the validation threshold selected using `which_v`, is used to perform prediction. See `summary.ithresh` to print the best thresholds for each validation threshold. If `which_u = "all"` then all the thresholds are used to perform prediction. The inferences from each threshold are weighted according to the posterior threshold weights given in equation (15) of Northrop et al. (2017) based on the prior probabilities of thresholds in `u_prior` and column `which_v` of the measures of predictive performance in `object$pred_perf`. Otherwise, `which_u` is a numeric scalar that indicates which element of `object$u_vec` the user wishes to select as a single threshold on which to base prediction, that is, `which_u` must be an integer in `1, ..., length(object$u_vec)`.
`which_v`	A numeric scalar. Indicates which element of `object$v_vec` is used in selecting a single threshold (if `which_u = "best"`) or weighting the inferences from all thresholds (if `which_u = "all"`). Note: the default, `which_v = 1` gives the lowest of the validation thresholds in `object$v_vec`.
`u_prior`	A numeric vector. Prior probabilities for the training thresholds in `object$u_vec`. Only used if `which_u = "all"`. Only the first `length(object$u_vec) - length(object$v_vec) + which_v` elements of `u_prior` are used. This is because only training thresholds up to and including `object$v_vec[which_v]` are relevant. `u_prior` must have length `length(object$u_vec)` or `length(object$u_vec) - length(object$v_vec) + which_v`. If `u_prior` is not supplied then all (relevant) training thresholds are given equal prior probability. `u_prior` is normalized to have sum equal to 1 inside `predict.ithresh`.
`type`	A character vector. Passed to `predict.evpost`. Indicates which type of inference is required: "p" for the predictive distribution function, "d" for the predictive density function, "q" for the predictive quantile function, "i" for predictive intervals (see `...` to set `level`), "r" for random generation from the predictive distribution. If `which_u = "all"` then only `type = "p"` or `type = "d"` are allowed.
`hpd`	A logical scalar. The argument `hpd` of `predict.evpost`. Only relevant if `type = "i"`.
`x`	A numeric vector. The argument `x` of `predict.evpost`. In the current context this must be a vector (not a matrix, as suggested by the documentation of `predict.evpost`). If `x` is not supplied then a default value is set within `predict.evpost`.
`...`	Additional arguments to be passed to `predict.evpost`. In particular: `level`, which can be used to set (multiple) levels for predictive intervals when `type = "i"`; `lower_tail` (relevant when `type = "p"` or `"q"`) and `log` (relevant when `type = "d"`).

Details

The function predict.evpost is used to perform predictive based on the binomial-GP posterior sample generated using a given training threshold. For mathematical details of the single threshold and multiple threshold cases see Sections 2.3 and 3 of Northrop et al. (2017) respectively.

Value

An list object of class "ithreshpred" with a similar structure to an object of class "evpred" returned from predict.evpost is returned invisibly. In addition, the object contains u_vec = object$u_vec and v_vec = object$v_vec, which_v and the index best_u in u_vec = object$u_vec of the best training threshold based on the value of which_v. It also contains the value of the Box-Cox transformation parameter lambda. This will always be equal to 1 if object was returned from ithresh.

If which_u = "all" then

the list also contains the posterior threshold weights in component post_thresh_wts
the component y is a matrix with length{x} rows and 1 + length(object$u_vec) - length(object$v_vec) + which_v columns. Column 1 contains the estimated predictive distribution function (type = "p") or density function (type = "d") obtained using a weighted average of the inferences over different training thresholds. The other columns contain the estimated functions for each of the training thresholds in u_vec.

References

Northrop, P. J., Attalides, N. and Jonathan, P. (2017) Cross-validatory extreme value threshold selection and uncertainty with application to ocean storm severity. Journal of the Royal Statistical Society Series C: Applied Statistics, 66(1), 93-120. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1111/rssc.12159")}

Examples

# Note:
#'  In the examples below validation thresholds rather higher than is
#   advisable have been used, with far fewer excesses than the minimum of
#   50 suggested by Jonathan and Ewans (2013).

# Gulf of Mexico significant wave heights, default priors.
u_vec_gom <- quantile(gom, probs = seq(0, 0.9, by = 0.05))
gom_cv <- ithresh(data = gom, u_vec = u_vec_gom, n_v = 3)

# Note: gom_cv$npy contains the correct value of npy (it was set in the
#       call to ithresh, via attr(gom, "npy").
#       If object$npy doesn't exist then the argument npy must be supplied
#       in the call to predict().

### Best training threshold based on the lowest validation threshold

# Predictive distribution function
best_p <- predict(gom_cv, n_years = c(100, 1000))
plot(best_p)

# Predictive density function
best_d <- predict(gom_cv, type = "d", n_years = c(100, 1000))
plot(best_d)

# Predictive intervals
best_i <- predict(gom_cv, n_years = c(100, 1000), type = "i", hpd = TRUE,
                  level = c(95, 99))
plot(best_i, which_int = "both")

# See which threshold was used
summary(gom_cv)

### All thresholds plus weighted average of inferences over all thresholds

# Predictive distribution function
all_p <- predict(gom_cv, which_u = "all")
plot(all_p)

# Predictive density function
all_d <- predict(gom_cv, which_u = "all", type = "d")
plot(all_d)

threshr documentation built on Sept. 12, 2024, 6:50 a.m.