evgam: Fitting generalised additive extreme-value family models
In evgam: Generalised Additive Extreme Value Models

evgam

R Documentation

Fitting generalised additive extreme-value family models

Description

Function evgam fits generalised additive extreme-value models. It allows the fitting of various extreme-value models, including the generalised extreme value and Pareto distributions. It can also perform quantile regression via the asymmetric Laplace dsitribution.

Usage

evgam(
  formula,
  data,
  family = "gev",
  correctV = TRUE,
  rho0 = 0,
  inits = NULL,
  outer = "bfgs",
  control = NULL,
  removeData = FALSE,
  trace = 0,
  knots = NULL,
  maxdata = 1e+20,
  maxspline = 1e+20,
  compact = FALSE,
  ald.args = list(),
  exi.args = list(),
  pp.args = list(),
  sandwich.args = list()
)

Arguments

`formula`	a list of formulae for location, scale and shape parameters, as in gam
`data`	a data frame
`family`	a character string giving the type of family to be fitted; defaults to `"gev"`
`correctV`	logicial: should the variance-covariance matrix include smoothing parameter uncertainty? Defaults to `TRUE`
`rho0`	a scalar or vector of initial log smoothing parameter values; a scalar will be repeated if there are multiple smoothing terms
`inits`	a vector or list giving initial values for constant basis coefficients; if a list, a grid is formed using expand.grid, and the ‘best’ used; defaults to `NULL`, so initial values are automatically found
`outer`	a character string specifying the outer optimiser is full `"Newton"`, `"BFGS"` or uses finite differences, `"FD"`; defaults to `"BFGS"`
`control`	a list of lists of control parameters to pass to inner and outer optimisers; defaults to `evgam.control()`
`removeData`	logical: should `data` be removed from `evgam` object? Defaults to `FALSE`
`trace`	an integer specifying the amount of information supplied about fitting, with `-1` suppressing all output; defaults to `0`
`knots`	passed to s; defaults to `NULL`
`maxdata`	an integer specifying the maximum number of `data` rows. `data` is sampled if its number of rows exceeds `maxdata`; defaults to `1e20`
`maxspline`	an integer specifying the maximum number of `data` rows used for spline construction; defaults to `1e20`
`compact`	logical: should duplicated `data` rows be compacted? Defaults to `FALSE`
`ald.args`	a list of arguments for `family="ald"`; see Details
`exi.args`	a list of arguments for `family="exi"`; see Details
`pp.args`	a list of arguments for `family="pp"`; see Details
`sandwich.args`	a list of arguments for sandwich adjustment; see Details

Details

The following families are currently available: "ald", the asymmetric Laplace distribution, primarily intended for quantile regression, as in Yu & Moyeed (2001); "gev" (default), the generalised extreme valued distribution; "exp", the exponential distribution; "gpd", the generalised Pareto distribution; "gauss", the Gaussian distribution; "pp", the point process model for extremes, implemented through r-largest order statistics; "weibull", the Weibull distribution; "exi", estimation if the extremal index, as in Schlather & Tawn (2003).

Arguments for the asymmetric Laplace distribution are given by ald.args. A scalar tau defines the quantile sought, which has no default. The scalar C specifies the curvature parameter of Oh et al. (2011).

Arguments for extremal index estimation are given by exi.args. A character string id specifies the variable in dataover which an nexi (default 2) running max. has been taken. The link is specified as a character string, which is one of "logistic", "probit", "cloglog"; defaults to "logistic".

Arguments for the point process model are given by pp.args. An integer r specifies the number of order statistics from which the model will be estimated. If r = -1, all data will be used. The character string id specifies the variable in data over which the point process isn't integrated; e.g. if a map of parameter estimates related to extremes over time is sought, integration isn't over locations. The scalar nper number of data per period of interest; scalar or integer vector ny specifies the number of periods; if length(ny) > 1 then names(ny) must ne supplied and must match to every unique id. logical correctny specifies whether ny is corrected to adjust proportionally for data missingness.

Arguments for the sandwich adjustment are given by sandwich.args. A character string id can be supplied to the list, which identifies the name of the variable in data such that independence will be assumed between its values. The method for the adjustment is supplied as "magnitude" (default) or "curvature"; see Chandler & Bate (2007) for their definitions.

Value

An object of class evgam

References

Chandler, R. E., & Bate, S. (2007). Inference for clustered data using the independence loglikelihood. Biometrika, 94(1), 167-183.

Oh, H. S., Lee, T. C., & Nychka, D. W. (2011). Fast nonparametric quantile regression with arbitrary smoothing methods. Journal of Computational and Graphical Statistics, 20(2), 510-526.

Schlather, M., & Tawn, J. A. (2003). A dependence measure for multivariate and spatial extreme values: Properties and inference. Biometrika, 90(1), 139-156.

Wood, S. N., Pya, N., & Safken, B. (2016). Smoothing parameter and model selection for general smooth models. Journal of the American Statistical Association, 111(516), 1548-1563.

Youngman, B. D. (2022). evgam: An R Package for Generalized Additive Extreme Value Modules. Journal of Statistical Software. To appear. doi: 10.18637/jss.v103.i03

Yu, K., & Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters, 54(4), 437-447.

Examples


data(fremantle)
fmla_gev <- list(SeaLevel ~ s(Year, k=5, bs="cr"), ~ 1, ~ 1)
m_gev <- evgam(fmla_gev, fremantle, family = "gev")



data(COprcp)

## fit generalised Pareto distribution to excesses on 20mm

COprcp <- cbind(COprcp, COprcp_meta[COprcp$meta_row,])
threshold <- 20
COprcp$excess <- COprcp$prcp - threshold
COprcp_gpd <- subset(COprcp, excess > 0)
fmla_gpd <- list(excess ~ s(lon, lat, k=12) + s(elev, k=5, bs="cr"), ~ 1)
m_gpd <- evgam(fmla_gpd, data=COprcp_gpd, family="gpd")

## fit generalised extreme value distribution to annual maxima

COprcp$year <- format(COprcp$date, "%Y")
COprcp_gev <- aggregate(prcp ~ year + meta_row, COprcp, max)
COprcp_gev <- cbind(COprcp_gev, COprcp_meta[COprcp_gev$meta_row,])
fmla_gev2 <- list(prcp ~ s(lon, lat, k=30) + s(elev, bs="cr"), ~ s(lon, lat, k=20), ~ 1)
m_gev2 <- evgam(fmla_gev2, data=COprcp_gev, family="gev")
summary(m_gev2)
plot(m_gev2)
predict(m_gev2, newdata=COprcp_meta, type="response")

## fit point process model using r-largest order statistics

# we have `ny=30' years' data and use top 45 order statistics
pp_args <- list(id="id", ny=30, r=45)
m_pp <- evgam(fmla_gev2, COprcp, family="pp", pp.args=pp_args)

## estimate 0.98 quantile using asymmetric Laplace distribution

fmla_ald <- prcp ~ s(lon, lat, k=15) + s(elev, bs="cr")
m_ald <- evgam(fmla_ald, COprcp, family="ald", ald.args=list(tau=.98))

evgam documentation built on June 28, 2022, 5:07 p.m.