evm: Extreme value modelling
In texmex: Statistical Modelling of Extreme Values

View source: R/evm.R

evm	R Documentation

Extreme value modelling

Description

Likelihood based modelling and inference for extreme value models, possibly with explanatory variables.

Usage

evm(y, data = NULL, ...)

evmReal(y, data)

evm.default(
  y,
  data = NULL,
  family = gpd,
  th = -Inf,
  qu,
  ...,
  penalty = NULL,
  prior = "gaussian",
  method = "optimize",
  cov = "observed",
  start = NULL,
  priorParameters = NULL,
  maxit = 10000,
  trace = NULL,
  iter = 40500,
  burn = 500,
  thin = 4,
  chains = 1,
  proposal.dist = c("gaussian", "cauchy"),
  jump.cov,
  jump.const = NULL,
  R = 1000,
  cores = NULL,
  export = NULL,
  verbose = TRUE,
  call = NULL
)

Arguments

`y`	Either a numeric vector, the name of a variable in `data` or a string representing the name of a variable in `data`. NOTE THAT the use of non-standard evaluation is likely to be removed from future versions of texmex.
`data`	A data frame containing `y` and any covariates.
`...`	In `evm`, formulae for the parameters in the family, e.g. `phi = ~ x`. If none are specified, they all default to `~1`.
`family`	An object of class 'texmexFamily'. Defaults to `family=gpd` and a generalized Pareto distribution (GPD) is fit to the data. Alternatively the family could be `gev`, `weibull` or `gumbel`, resulting in a generalized extreme value distribution, Weibull or Gumbell distribution being fit. Family `cgpd` fits the generalized Pareto distribution but with the shape parameter constrained to be > 0.5 by using the link function suggested by Yee and Stephenson (2007), `\eta` = log(`\xi` + 0.5). Family `egp3` fits the extended GP family 3 of Papastathopoulos and Tawn (2013). No other families are currently available in texmex, but users may write their own.
`th`	For threshold excess models (such as when `family=gpd`), the threshold for `y`, exceedances above which will be used to fit the upper tail model. Note that if you have already thresholded your data and want to model all of `y`, you still need to specify `th`.
`qu`	An alternative to `th`, a probability defined such that `quantile(y,qu)` equals `th`.
`penalty`	How to penalize the likelhood. Currently, either "none"", "gaussian"" or "lasso"" are the only allowed values. If `penalty` is "gaussian" or "lasso" then the parameters for the penalization are specified through the `priorParameters` argument. See below. Defaults to `penalty=NULL` and applies maximum likelihood estimation.
`prior`	If `method = "optimize"`, just an alternative way of specifying the penalty, and only one or neither of `penalty` and `prior` should be given. If `method = "simulate"`, prior must be "gaussian" because no other prior distributions have been implemented.
`method`	Should be either "optimize" (the default), "simulate" or "bootstrap". The first letter or various abbreviations will do. If 'optimize' is used, the (penalized) likelihood is directly optimized using `optim` and point estimates (either ML or MAP estimates) are returned with other information. If "simulate", a Metropolis algorithm is used to simulate from the joint posterior distribution of the parameters. If "bootstrap", a parametric boostrap is performed.
`cov`	How to compute the covariance matrix of the parameters. Defaults to `cov = "observed"` in which case the observed information matrix is used, if the `info` element of the `texmexFamily` object is present. Note that currently, this is not implemented for `gev`. Alternatives are `cov = "numeric"` in which case a numerical approximation of the Hessian is used (see the help for `optim`), or `cov = "sandwich"` if the `sandwich` element of the `texmexFamily` object is implemented. The `cov = "sandwich"` method implements the Huber sandwich correction to the covariance matrix for data which are not independent and in which case the likelihood function no longer has the interpretation of a joint likelihood, but instead should be interpreted as a pseudo-likelihod. In some cases, particularly with small samples, the numerical approximation can be quite different from the closed form (`cov="observed"`) result, and the value derived from the observed information should be preferred. However, in either case, since the underlying log-likelihood may be far from quadratic for small samples, the resulting estimates of standard errors are liable to approximate poorly the true standard errors. Also see the comments in the Details section, below.
`start`	Starting values for the parameters, to be passed to `optim`. If not provided, the function will use the `start` element of the `texmexFamily` object if it exists.
`priorParameters`	A list with two components. The first should be a vector of means, the second should be a covariance matrix if the penalty/prior is "gaussian" or "quadratic" and a diagonal precision matrix if the penalty/prior is "lasso", "L1" or "Laplace". If `method = "simulate"` then these represent the parameters in the Gaussian prior distribution. If `method = 'optimize'` then these represent the parameters in the penalty function. If not supplied: all default prior means are zero; all default prior variances are `10^4`; all covariances are zero.
`maxit`	The number of iterations allowed in `optim`.
`trace`	Whether or not to print progress to screen. If `method = "optimize"`, the argument is passed into `optim` – see the help for that function. If `method = "simulate"`, the argument determines at how many steps of the Markov chain the function should tell the user, and in this case it defaults to `trace = 10000`.
`iter`	Number of simulations to generate under `method = "simulate"`. Defaults to 40500.
`burn`	The number of initial steps to be discarded. Defaults to 500.
`thin`	The degree of thinning of the resulting Markov chains. Defaults to 4 (one in every 4 steps is retained).
`chains`	The number of Markov chains to run. Defaults to 1. If you run more than 1, the function tries to figure out how to do it in parallel using as many cores as there are chains.
`proposal.dist`	The proposal distribution to use, either multivariate gaussian or a multivariate Cauchy.
`jump.cov`	Covariance matrix for proposal distribution of Metropolis algorithm. This is scaled by `jump.const`.
`jump.const`	Control parameter for the Metropolis algorithm.
`R`	The number of parametric bootstrap samples to run when `method = "bootstrap"` is requested. Defaults to 1000.
`cores`	The number of cores to use when bootstrapping. Defaults to `cores=NULL` and the function guesses how many cores are available and uses them all.
`export`	Character vector of names of objects to export if parallel processing is being used and you are using objects from outside of texmex. It it passed to `parallel::clusterExport` and used by `texmex::evmBoot`.
`verbose`	Whether or not to print progress to screen. Defaults to `verbose=TRUE`.
`call`	Used internally, defaults to `call = NULL`.

Details

The main modelling function is evm (extreme value model) and the distribution to be used is specified by passing an object of class texmexFamily to the family argument.

The default texmexFamily object used by evm is gpd. Currently, the other texmexFamily objects available are gev which results in fitting a generalized extreme value (GEV) distribution to the data, gpdIntCensored which can be used to fit the GPD to data which has been rounded to a given number of decimal places by recognisiing the data as interval censored, and egp3 which fits the extended generalized Pareto distribution version 3 of Papastathopoulos and Tawn (2013).

See Coles (2001) for an introduction to extreme value modelling and the GPD and GEV models.

For the GPD model, we use the following parameterisation of evm:

P(Y \le y) = 1 - (1 + \xi y / \sigma)^{-1/\xi}

for y \ge 0 and 1 + \xi y / \sigma \ge 0.

For the GEV model, we use:

P(Y \le y) = exp (-(1 + \xi (y - \mu) / \sigma) ^{-1/\xi})

In each case, the scale parameter is sigma (\sigma) and the shape parameter is xi (\xi). The GEV distribution also has location parameter mu (\mu). See Papastathopoulos and Tawn (2013) for specification of the EGP3 model.

Working with the log of the scale parameter improves the stability of computations, makes a quadratic penalty more appropriate and enables the inclusion of covariates in the model for the scale parameter, which must remain positive. We therefore work with \phi=log(\sigma). All specification of priors or penalty functions refer to \phi rather than \sigma. A quadratic penalty can be thought of as a Gaussian prior distribution, whence the terminology of the function.

Parameters of the evm are estimated by using maximum (penalized) likelihood (method = "optimize"), or by simulating from the posterior distribution of the model parameters using a Metropolis algorithm (method = "simulate"). In the latter case, start is used as a starting value for the Metropolis algorithm; in its absence, the maximum penalized likelhood point estimates are computed and used.

A boostrap approach is also available (method = "bootstrap"). This runs a parametric bootstrap, simulating from the model fit by optimization.

When method = "simulate" the print and summary functions give posterior means and standard deviations. Posterior means are also returned by the coef method. Depending on what you want to do and what the posterior distributions look like (use plot method) you might want to work with quantiles of the posterior distributions instead of relying on standard errors.

When method = "bootstrap", summaries of the bootstrap distribution and the bootstrap estimate of bias are displayed.

Value

If method = "optimize", an object of class evmOpt:

`call`	The call to `evmSim` that produced the object.
`data`	The original data (above and below the threshold for fitting if a distribution for threshold excesses has been used). In detail, `data` is a list with elements `y` and `D`. `y` is the response variable and `D` is a list containing the design matrices implied by any formlae used in the call to `evm`.
`convergence`	Output from `optim` relating to whether or not the optimizer converged.
`message`	A message telling the user whether or not convergence was achieved.
`threshold`	The threshold of the data above which the evmSim model was fit.
`penalty`	The type of penalty function used, if any.
`coefficients`	The parameter estimates as computed under maximum likelihood or maximum penalized likelihood.
`rate`	The proportion of observations above the threshold. If the model is not a threshold exceedance model (e.g. the GEV model), the rate will be 1.
`priorParameters`	See above.
`residuals`	Residuals computed using the residual function in the `texmexFamily` object, if any. These are used primarly for producing QQ and PP plots via `plot.evmOpt` or `ggplot.evmOpt`. The residuals are transformed values of the raw data, accounting for the parameter estimates: see the `residuals` component of the `texmexFamily` object for the calculations. For the generalized Pareto family, they are (if the model fits well) standard exponential variates; for the GEV family, standard Gumbel variates.
`ploglik`	The value of the optimized penalized log-likelihood.
`loglik`	The value of the optimized (unpenalized) log-likelihood. If `penalty='none'` is used, this will be identical to `ploglik`, above.
`cov`	The estimated covariance of the parameters in the model.
`se`	The estimated standard errors of the parameters in the model.
`xlevels`	A named list containing a named list for each design matrix (main parameter) in the model. Each list contians an element named after each factor in the linear predictor for the respective design matrix. These are used by the `predict` method to ensure all factor levels are known, even if they don't appear in `newdata`.

If method = "simulate", an object of class evmSim:

`call`	The call to `evmSim` that produced the object.
`threshold`	The threshold above which the model was fit.
`map`	The point estimates found by maximum penalized likelihood and which were used as the starting point for the Markov chain. This is of class `evmOpt` and methods for this class (such as resid and plot) may be useful.
`burn`	The number of steps of the Markov chain that are to be treated as the burn-in and not used in inferences.
`thin`	The degree of thinning used.
`chains`	The entire Markov chain generated by the Metropolis algorithm.
`y`	The response data above the threshold for fitting.
`seed`	The seed used by the random number generator.
`param`	The remainder of the chain after deleting the burn-in and applying any thinning.

If method = "bootstrap", an object of class evmBoot:

`call`	The call to `evmBoot` that produced the object.
`replicates`	The parameter estimates from the bootstrap fits.
`map`	The fit by by maximum penalized likelihood to the orginal data. This is of class `evmOpt` and methods for this class (such as resid and plot) may be useful.

There are summary, plot, print, residuals and coefficients methods available for these classes.

Note

For both GPD and GEV models, when there are estimated values of \xi \le -0.5, the regularity conditions of the likelihood break down and inference based on approximate standard errors cannot be performed. In this case, the most fruitful approach to inference appears to be by the bootstrap. It might be possible to simulate from the posterior, but finding a good proposal distribution might be difficult and you should take care to get an acceptance rate that is reasonably high (around 40% when there are no covariates, lower otherwise). To constrain the parameter space of the GP shape parameter, use family = cgpd in the call to evm and the transformation \eta = log(\xi + 0.5) is used, as suggested by Yee and Stephenson (2007).

Author(s)

Janet E. Heffernan, Harry Southworth. Some of the internal code is based on the gpd.fit function in the ismev package and is due to Stuart Coles.

References

S. Coles. An Introduction to Statistical Modelling of Extreme Values. Springer, 2001.

I. Papastathopoulos and J. A. Tawn, Extended generalised Pareto models for tail estimation, Journal of Statistical Planning and Inference, 143, 131 - 143, 2013.

T. W. Yee and A. G. Stephenson, Vector generalized linear and additive extreme value models, Extremes, 10, 1 – 19, 2007.

Examples


  
  #mod <- evm(rain, th=30)
  #mod
  #par(mfrow=c(2, 2))
  #plot(mod)
  

  
  mod <- evm(rain, th=30, method="sim")
  par(mfrow=c(3, 2))
  plot(mod)
  

  
  mod <- evm(SeaLevel, data=portpirie, family=gev)
  mod
  plot(mod)
  

  
  mod <- evm(SeaLevel, data=portpirie, family=gev, method="sim")
  par(mfrow=c(3, 3))
  plot(mod)

texmex documentation built on June 22, 2024, 12:26 p.m.