R/fgpd.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Documented in fgpd lgpd nlgpd

#' @export
#' 
#' @title MLE Fitting of Generalised Pareto Distribution (GPD)
#'   
#' @description Maximum likelihood estimation for fitting the GPD with
#'   parameters scale \code{sigmau} and shape \code{xi} to the threshold
#'   exceedances, conditional on being above a threshold \code{u}. Unconditional
#'   likelihood fitting also provided when the probability \code{phiu} of being
#'   above the threshold \code{u} is given.
#'   
#' @param x          vector of sample data
#' @param u          scalar threshold
#' @param sigmau     scalar scale parameter (positive)
#' @param xi         scalar shape parameter
#' @param pvector    vector of initial values of GPD parameters (\code{sigmau}, \code{xi}) or \code{NULL}
#' @param phiu       probability of being above threshold \eqn{[0, 1]} or \code{NULL}, see Details
#' @param std.err    logical, should standard errors be calculated
#' @param method     optimisation method (see \code{\link[stats:optim]{optim}})
#' @param control    optimisation control list (see \code{\link[stats:optim]{optim}})
#' @param finitelik  logical, should log-likelihood return finite value for invalid parameters
#' @param log        logical, if \code{TRUE} then log-likelihood rather than likelihood is output
#' @param ...        optional inputs passed to \code{\link[stats:optim]{optim}}
#'   
#' @details The GPD is fitted to the exceedances of the threshold \code{u} using
#'   maximum likelihood estimation. The estimated parameters, 
#'   variance-covariance matrix and their standard errors are automatically 
#'   output.
#'   
#'   The log-likelihood and negative log-likelihood are also provided for wider 
#'   usage, e.g. constructing your own extreme value mixture model or profile
#'   likelihood functions. The 
#'   parameter vector \code{pvector} must be specified in the negative 
#'   log-likelihood \code{\link[evmix:fgpd]{nlgpd}}.
#'   
#'   Log-likelihood calculations are carried out in 
#'   \code{\link[evmix:fgpd]{lgpd}}, which takes parameters as inputs in the 
#'   same form as distribution functions. The negative log-likelihood is a 
#'   wrapper for \code{\link[evmix:fgpd]{lgpd}}, designed towards making it 
#'   useable for optimisation (e.g. parameters are given a vector as first 
#'   input).
#'   
#'   The default value for the tail fraction \code{phiu} in the fitting function
#'   \code{\link[evmix:fgpd]{fgpd}} is \code{NULL}, in which case the MLE is calculated 
#'   using the sample proportion of exceedances. In this case the standard error for \code{phiu} is 
#'   estimated and output as \code{se.phiu}, otherwise it is set to \code{NA}. Consistent with the 
#'   \code{\link[evd:fpot]{evd}} library the missing values (\code{NA} and 
#'   \code{NaN}) are assumed to be below the threshold in calculating the tail fraction.
#'   
#'   Otherwise, in the fitting function \code{\link[evmix:fgpd]{fgpd}} the tail 
#'   fraction \code{phiu} can be specified as any value over \eqn{(0, 1]}, i.e.
#'   excludes \eqn{\phi_u=0}, leading to the unconditional log-likelihood being
#'   used for estimation. In this case the standard error will be output as \code{NA}.
#'   
#'   In the log-likelihood functions \code{\link[evmix:fgpd]{lgpd}} and 
#'   \code{\link[evmix:fgpd]{nlgpd}} the tail fraction \code{phiu} cannot be
#'   \code{NULL} but can be over the range \eqn{[0, 1]}, i.e. which includes
#'   \eqn{\phi_u=0}.
#'   
#'   The value of \code{phiu} does not effect the GPD parameter estimates, only
#'   the value of the likelihood, as:
#'   
#'   \deqn{L(\sigma_u, \xi; u, \phi_u) = (\phi_u ^ {n_u}) L(\sigma_u, \xi; u,
#'   \phi_u=1)}
#'   
#'   where the GPD has scale \eqn{\sigma_u} and shape \eqn{\xi}, the threshold
#'   is \eqn{u} and \eqn{nu} is the number of exceedances. A non-unit value for
#'   \code{phiu} simply scales the likelihood and shifts the log-likelihood,
#'   thus the GPD parameter estimates are invariant to \code{phiu}.
#'   
#'   The default optimisation algorithm is "BFGS", which requires a finite
#'   negative log-likelihood function evaluation \code{finitelik=TRUE}. For
#'   invalid parameters, a zero likelihood is replaced with \code{exp(-1e6)}.
#'   The "BFGS" optimisation algorithms require finite values for likelihood, so
#'   any user input for \code{finitelik} will be overridden and set to
#'   \code{finitelik=TRUE} if either of these optimisation methods is chosen.
#'   
#'   It will display a warning for non-zero convergence result comes from 
#'   \code{\link[stats:optim]{optim}} function call.
#'   
#'   If the hessian is of reduced rank then the variance covariance (from
#'   inverse hessian) and standard error of parameters cannot be calculated,
#'   then by default \code{std.err=TRUE} and the function will stop. If you want
#'   the parameter estimates even if the hessian is of reduced rank (e.g. in a
#'   simulation study) then set \code{std.err=FALSE}.
#'   
#' @return \code{\link[evmix:fgpd]{lgpd}} gives (log-)likelihood and 
#'   \code{\link[evmix:fgpd]{nlgpd}} gives the negative log-likelihood. 
#'   \code{\link[evmix:fgpd]{fgpd}} returns a simple list with the following
#'   elements
#'   
#' \tabular{ll}{
#' \code{call}:     \tab \code{optim} call\cr
#' \code{x}:        \tab data vector \code{x}\cr
#' \code{init}:     \tab \code{pvector}\cr
#' \code{optim}:    \tab complete \code{optim} output\cr
#' \code{mle}:      \tab vector of MLE of parameters\cr
#' \code{cov}:      \tab variance-covariance matrix of MLE of parameters\cr
#' \code{se}:       \tab vector of standard errors of MLE of parameters\cr
#' \code{rate}:     \tab \code{phiu} to be consistent with \code{\link[evd:fpot]{evd}}\cr
#' \code{nllh}:     \tab minimum negative log-likelihood\cr
#' \code{n}:        \tab total sample size\cr
#' \code{u}:        \tab threshold\cr
#' \code{sigmau}:   \tab MLE of GPD scale\cr
#' \code{xi}:       \tab MLE of GPD shape\cr
#' \code{phiu}:     \tab MLE of tail fraction\cr
#' \code{se.phiu}:  \tab standard error of MLE of tail fraction (parameterised approach using sample proportion)\cr
#' }
#'   
#' The output list has some duplicate entries and repeats some of the inputs to both 
#' provide similar items to those from \code{\link[evd:fpot]{fpot}} and increase usability.
#'   
#' @note Unlike all the distribution functions for the GPD, the MLE fitting only
#'   permits single scalar values for each parameter, \code{phiu} and threshold
#'   \code{u}.
#'   
#'   When \code{pvector=NULL} then the initial values are calculated, type
#'   \code{fgpd} to see the default formulae used. The GPD fitting is not very
#'   sensitive to the initial values, so you will rarely have to  give
#'   alternatives. Avoid setting the starting value for the shape parameter to
#'   \code{xi=0} as depending on the optimisation method it may be get stuck.
#'   
#'   Default values for the threshold \code{u=0} and tail fraction
#'   \code{phiu=NULL} are given in the fitting \code{\link[evmix:fgpd]{fpgd}},
#'   in which case the MLE assumes that excesses over the threshold are given,
#'   rather than exceedances.
#'   
#'   The usual default of \code{phiu=1} is given in the likelihood functions
#'   \code{\link[evmix:fgpd]{lpgd}} and \code{\link[evmix:fgpd]{nlpgd}}.
#'   
#'   The \code{\link[evmix:fgpd]{lgpd}} also has the usual defaults for the
#'   other parameters, but \code{\link[evmix:fgpd]{nlgpd}} has no defaults.
#'   
#'   Infinite sample values are dropped in fitting function
#'   \code{\link[evmix:fgpd]{fpgd}}, but missing values are used to estimate
#'   \code{phiu} as described above. But in likelihood functions
#'   \code{\link[evmix:fgpd]{lpgd}} and \code{\link[evmix:fgpd]{nlpgd}} both
#'   infinite and missing values are ignored.
#'   
#'   Error checking of the inputs is carried out and will either stop or give
#'   warning message as appropriate.
#'   
#' @references
#' 
#' \url{http://en.wikipedia.org/wiki/Generalized_Pareto_distribution}
#' 
#' Hu Y. and Scarrott, C.J. (2018). evmix: An R Package for Extreme Value Mixture Modeling, 
#' Threshold Estimation and Boundary Corrected Kernel Density Estimation. Journal of
#' Statistical Software 84(5), 1-27. doi: 10.18637/jss.v084.i05.
#' 
#' @author Yang Hu and Carl Scarrott \email{carl.scarrott@@canterbury.ac.nz}
#'   
#' @section Acknowledgments: Based on the \code{\link[ismev:gpd.fit]{gpd.fit}} and
#' \code{\link[evd:fpot]{fpot}} functions in the 
#' \code{\link[ismev:gpd.fit]{ismev}} and
#' \code{\link[evd:fpot]{evd}} packages for which their author's contributions are gratefully acknowledged.
#' They are designed to have similar syntax and functionality to simplify the transition for users of these packages.
#'   
#' @seealso \code{\link[evd:gpd]{dgpd}}, \code{\link[evd:fpot]{fpot}} and
#'   \code{\link[MASS:fitdistr]{fitdistr}}
#'   
#' @aliases fgpd lgpd nlgpd
#' @family gpd
#' @family fgpd
#'   
#' @examples
#' set.seed(1)
#' par(mfrow = c(2, 1))
#' 
#' # GPD is conditional model for threshold exceedances
#' # so tail fraction phiu not relevant when only have exceedances
#' x = rgpd(1000, u = 10, sigmau = 5, xi = 0.2)
#' xx = seq(0, 100, 0.1)
#' hist(x, breaks = 100, freq = FALSE, xlim = c(0, 100))
#' lines(xx, dgpd(xx, u = 10, sigmau = 5, xi = 0.2))
#' fit = fgpd(x, u = 10)
#' lines(xx, dgpd(xx, u = fit$u, sigmau = fit$sigmau, xi = fit$xi), col="red")
#' 
#' # but tail fraction phiu is needed for conditional modelling of population tail
#' x = rnorm(10000)
#' xx = seq(-4, 4, 0.01)
#' hist(x, breaks = 200, freq = FALSE, xlim = c(0, 4))
#' lines(xx, dnorm(xx), lwd = 2)
#' fit = fgpd(x, u = 1)
#' lines(xx, dgpd(xx, u = fit$u, sigmau = fit$sigmau, xi = fit$xi, phiu = fit$phiu),
#'   col = "red", lwd = 2)
#' legend("topright", c("True Density","Fitted Density"), col=c("black", "red"), lty = 1)
#' 

# maximum likelihood fitting for GPD
fgpd <- function(x, u = 0, phiu = NULL, pvector = NULL, std.err = TRUE,
  method = "BFGS", control = list(maxit = 10000), finitelik = TRUE, ...) {

  call <- match.call()
    
  np = 2 # maximum number of parameters

  # Check properties of inputs
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.param(u)
  check.prob(phiu, allownull = TRUE) # don't use check.phiu as TRUE only valid for mixture models
  check.nparam(pvector, nparam = np, allownull = TRUE)
  check.logic(std.err)
  check.optim(method)
  check.control(control)
  check.logic(finitelik)

  if (any(is.infinite(x))) warning("infinite cases have been removed")
  
  x = x[!is.infinite(x)] # ignore infinite cases only (all mixture models also ignore missing)
  
  if (any(is.na(x)))
    warning("missing values are treated as below threshold when estimating tail fraction")

  check.quant(x, allowna = TRUE)
  n = length(x)

  if ((method == "L-BFGS-B") | (method == "BFGS")) finitelik = TRUE

  # assume NA or NaN are below threshold consistent with evd library
  # hence use which() to ignore these

  xu = x[which(x > u)]
  nu = length(xu)
  
  if (nu < 1)
    stop("no elements of x are above threshold")
    
  # set default values if pvector is NULL
  if (is.null(pvector)) {
    yu = xu - u
    pvector[1] = sqrt(6 * var(yu)) / pi
    pvector[2] = 0.1
  }
  
  if (is.null(phiu)) {
    phiu = nu / n
    se.phiu = sqrt(phiu * (1 - phiu) / n)
  } else {
    if (phiu == 0) stop("tail probability must be in (0, 1]")
    se.phiu = NA
  }
  
  # check initial parameter vector and try different shape
  nllh = nlgpd(pvector, xu, u, phiu)
  if (is.infinite(nllh)) {
    pvector[2] = 0.1
    nllh = nlgpd(pvector, xu, u, phiu)
  }

  if (is.infinite(nllh)) stop("initial parameter values are invalid")

  fit = optim(par = as.vector(pvector), fn = nlgpd, x = xu, u = u, phiu = phiu,
    finitelik = finitelik, control = control, method = method, hessian = TRUE, ...)
  
  conv = TRUE
  if ((fit$convergence != 0) | any(fit$par == pvector) | (abs(fit$value) >= 1e6)) {
    conv = FALSE
    warning("check convergence")
  }
  
  if (conv & std.err) {
    qrhess = qr(fit$hessian)
    if (qrhess$rank != ncol(qrhess$qr)) {
      warning("observed information matrix is singular")
      se = NULL
      invhess = NULL
    } else {
      invhess = solve(qrhess)
      vars = diag(invhess)
      if (any(vars <= 0)) {
        warning("observed information matrix is singular")
        invhess = NULL
        se = NULL
      } else {
        se = sqrt(vars)
      }  
    }
  } else {
    invhess = NULL
    se = NULL
  }

  list(call = call, x = as.vector(x), init = as.vector(pvector), optim = fit,
    conv = conv, cov = invhess, mle = fit$par, se = se, rate = phiu, nllh = fit$value,
    n = n, u = u, sigmau = fit$par[1], xi = fit$par[2], phiu = phiu, se.phiu = se.phiu)
}

#' @export
#' @aliases fgpd lgpd nlgpd
#' @rdname  fgpd

# log-likelihood function for GPD
lgpd <- function(x, u = 0, sigmau = 1, xi = 0, phiu = 1, log = TRUE) {

  # Check properties of inputs
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.param(u)
  check.param(sigmau)
  check.param(xi)
  check.prob(phiu) # don't use check.phiu as TRUE only valid for mixture models
  check.logic(log)

  check.inputn(c(length(u), length(sigmau), length(xi), length(phiu)), allowscalar = TRUE)

  if (any(!is.finite(x))) {
    warning("non-finite cases have been removed")
    x[!is.finite(x)] = NA # ignore missing and infinite cases
  }
  
  # assume NA or NaN are below threshold consistent with evd library
  # hence use which() to ignore these
  
  xu = x[which(x > u)]
  nu = length(xu)
  
  yu = (xu - u) / sigmau # used when shape is zero
  syu = 1 + xi * yu      # used when shape non-zero  
  
  if ((min(syu) <= 0) | (sigmau <= 0) | (phiu <= 0)  | (phiu > 1)) { # phiu = 1 is conditional likelihood
    l = -Inf
  } else {
    if (abs(xi) < 1e-6) {
      l = - nu * log(sigmau) - sum(yu) + nu * log(phiu)
    } else {
      l = - nu * log(sigmau) - (1/xi + 1) * sum(log(syu)) + nu * log(phiu)
    }
  }
  
  if (!log) l = exp(l)
  
  l
}

#' @export
#' @aliases fgpd lgpd nlgpd
#' @rdname  fgpd

# negative log-likelihood function for GPD
# (wrapper for likelihood, inputs and checks designed for optimisation)
nlgpd <- function(pvector, x, u = 0, phiu = 1, finitelik = FALSE) {

  np = 2 # maximum number of parameters

  # Check properties of inputs
  check.nparam(pvector, nparam = np) # must be provided
  check.param(u)
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.prob(phiu) # don't use check.phiu as TRUE only valid for mixture models
  check.logic(finitelik)
  
  sigmau = pvector[1]
  xi = pvector[2]

  nllh = -lgpd(x, u, sigmau, xi, phiu) 
  
  if (finitelik & is.infinite(nllh)) {
    nllh = sign(nllh) * 1e6
  }
  
  nllh
}
Any scripts or data that you put into this service are public.
evmix documentation built on Sept. 3, 2019, 5:07 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
evmix
Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/fgpd.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Defines functions fgpd lgpd nlgpd

Documented in fgpd lgpd nlgpd

Try the evmix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/fgpd.r In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Defines functions fgpd lgpd nlgpd

Documented in fgpd lgpd nlgpd

Try the evmix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

evmix
Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/fgpd.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation