R/fhpdcon.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Documented in fhpdcon lhpdcon nlhpdcon nluhpdcon profluhpdcon

#' @export
#' 
#' @title MLE Fitting of Hybrid Pareto Extreme Value Mixture Model with Single Continuity Constraint
#'
#' @description Maximum likelihood estimation for fitting the Hybrid Pareto extreme
#' value mixture model, with only continuity at threshold and not necessarily
#' continuous in first derivative. With options for profile likelihood estimation for
#' threshold and fixed threshold approach.
#'
#' @inheritParams fnormgpd
#' @inheritParams dnormgpd
#' @inheritParams fgpd
#' 
#' @details The hybrid Pareto model is fitted to the entire dataset using maximum
#' likelihood estimation, with only continuity at threshold and not necessarily
#' continuous in first derivative. The estimated parameters, variance-covariance matrix
#' and their standard errors are automatically output.
#' 
#' Note that the key difference between this model (\code{hpdcon}) and the 
#' normal with GPD tail and continuity at threshold (\code{normgpdcon}) is that the
#' latter includes the rescaling of the conditional GPD component
#' by the tail fraction to make it an unconditional tail model. However, for the hybrid
#' Pareto with single continuity constraint use the GPD in it's conditional form with no
#' differential scaling compared to the bulk model.
#' 
#' See help for \code{\link[evmix:fnormgpd]{fnormgpd}} for details, type \code{help fnormgpd}. Only
#' the different features are outlined below for brevity.
#' 
#' The profile likelihood and fixed threshold approach functionality are implemented for this
#' version of the hybrid Pareto as it includes the threshold as a parameter. Whereas the usual
#' hybrid Pareto does not naturally have a threshold parameter.
#' 
#' The GPD \code{sigmau} parameter is now specified as function of other parameters, see 
#' help for \code{\link[evmix:hpdcon]{dhpdcon}} for details, type \code{help hpdcon}.
#' Therefore, \code{sigmau} should not be included in the parameter vector if initial values
#' are provided, making the full parameter vector 
#' (\code{nmean}, \code{nsd}, \code{u}, \code{xi}) if threshold is also estimated and
#' (\code{nmean}, \code{nsd}, \code{xi}) for profile likelihood or fixed threshold approach.
#' 
#' @return \code{\link[evmix:fhpdcon]{lhpdcon}}, \code{\link[evmix:fhpdcon]{nlhpdcon}},
#' and \code{\link[evmix:fhpdcon]{nluhpdcon}} give the log-likelihood,
#' negative log-likelihood and profile likelihood for threshold. Profile likelihood
#' for single threshold is given by \code{\link[evmix:fhpdcon]{profluhpdcon}}.
#' \code{\link[evmix:fhpdcon]{fhpdcon}} returns a simple list with the following elements
#'
#' \tabular{ll}{
#'  \code{call}:    \tab \code{optim} call\cr
#'  \code{x}:       \tab data vector \code{x}\cr
#'  \code{init}:    \tab \code{pvector}\cr
#'  \code{fixedu}:  \tab fixed threshold, logical\cr
#'  \code{useq}:    \tab threshold vector for profile likelihood or scalar for fixed threshold\cr
#'  \code{nllhuseq}:  \tab profile negative log-likelihood at each threshold in useq\cr
#'  \code{optim}:   \tab complete \code{optim} output\cr
#'  \code{mle}:     \tab vector of MLE of parameters\cr
#'  \code{cov}:     \tab variance-covariance matrix of MLE of parameters\cr
#'  \code{se}:      \tab vector of standard errors of MLE of parameters\cr
#'  \code{rate}:    \tab \code{phiu} to be consistent with \code{\link[evd:fpot]{evd}}\cr
#'  \code{nllh}:    \tab minimum negative log-likelihood\cr
#'  \code{n}:       \tab total sample size\cr
#'  \code{nmean}:   \tab MLE of normal mean\cr
#'  \code{nsd}:     \tab MLE of normal standard deviation\cr
#'  \code{u}:       \tab threshold (fixed or MLE)\cr
#'  \code{sigmau}:  \tab MLE of GPD scale (estimated from other parameters)\cr
#'  \code{xi}:      \tab MLE of GPD shape\cr
#'  \code{phiu}:    \tab MLE of tail fraction (implied by \code{1/(1+pnorm(u,nmean,nsd))})\cr
#' }
#' 
#' @note When \code{pvector=NULL} then the initial values are:
#' \itemize{
#'  \item threshold 90\% quantile (not relevant for profile likelihood for threshold or fixed threshold approaches);
#'  \item MLE of normal parameters assuming entire population is normal; and
#'  \item MLE of GPD parameters above threshold. 
#' }
#' Avoid setting the starting value for the shape parameter to
#' \code{xi=0} as depending on the optimisation method it may be get stuck.
#' 
#' @references
#' \url{http://www.math.canterbury.ac.nz/~c.scarrott/evmix}
#' 
#' \url{http://en.wikipedia.org/wiki/Normal_distribution}
#' 
#' \url{http://en.wikipedia.org/wiki/Generalized_Pareto_distribution}
#' 
#' Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value
#' threshold estimation and uncertainty quantification. REVSTAT - Statistical
#' Journal 10(1), 33-59. Available from \url{http://www.ine.pt/revstat/pdf/rs120102.pdf}
#' 
#' Hu, Y. (2013). Extreme value mixture modelling: An R package and simulation study.
#' MSc (Hons) thesis, University of Canterbury, New Zealand.
#' \url{http://ir.canterbury.ac.nz/simple-search?query=extreme&submit=Go}
#' 
#' Carreau, J. and Y. Bengio (2008). A hybrid Pareto model for asymmetric fat-tailed data:
#' the univariate case. Extremes 12 (1), 53-76.
#' 
#' @author Yang Hu and Carl Scarrott \email{carl.scarrott@@canterbury.ac.nz}
#'
#' @section Acknowledgments: See Acknowledgments in
#'   \code{\link[evmix:fnormgpd]{fnormgpd}}, type \code{help fnormgpd}.
#' 
#' @seealso \code{\link[stats:Normal]{dnorm}},
#'  \code{\link[evmix:fgpd]{fgpd}} and \code{\link[evmix:gpd]{gpd}}
#'  
#' The condmixt package written by one of the
#' original authors of the hybrid Pareto model (Carreau and Bengio, 2008) also has 
#' similar functions for the likelihood of the hybrid Pareto 
#' (hpareto.negloglike) and fitting (hpareto.fit).
#' 
#' @aliases fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon
#' @family  hpd
#' @family  hpdcon
#' @family  normgpdcon
#' @family  fhpdcon
#' 
#' @examples
#' \dontrun{
#' set.seed(1)
#' par(mfrow = c(2, 1))
#' 
#' x = rnorm(1000)
#' xx = seq(-4, 4, 0.01)
#' y = dnorm(xx)
#' 
#' # Hybrid Pareto provides reasonable fit for some asymmetric heavy upper tailed distributions
#' # but not for cases such as the normal distribution
#' 
#' # Continuity constraint
#' fit = fhpdcon(x)
#' hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 4))
#' lines(xx, y)
#' with(fit, lines(xx, dhpdcon(xx, nmean, nsd, u, xi), col="red"))
#' abline(v = fit$u, col = "red")
#'   
#' # No continuity constraint
#' fit2 = fhpd(x)
#' with(fit2, lines(xx, dhpd(xx, nmean, nsd, xi), col="blue"))
#' abline(v = fit2$u, col = "blue")
#' legend("topleft", c("True Density","No continuity constraint","With continuty constraint"),
#'   col=c("black", "blue", "red"), lty = 1)
#'   
#' # Profile likelihood for initial value of threshold and fixed threshold approach
#' fitu = fhpdcon(x, useq = seq(-2, 2, length = 20))
#' fitfix = fhpdcon(x, useq = seq(-2, 2, length = 20), fixedu = TRUE)
#' 
#' hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 4))
#' lines(xx, y)
#' with(fit, lines(xx, dhpdcon(xx, nmean, nsd, u, xi), col="red"))
#' abline(v = fit$u, col = "red")
#' with(fitu, lines(xx, dhpdcon(xx, nmean, nsd, u, xi), col="purple"))
#' abline(v = fitu$u, col = "purple")
#' with(fitfix, lines(xx, dhpdcon(xx, nmean, nsd, u, xi), col="darkgreen"))
#' abline(v = fitfix$u, col = "darkgreen")
#' legend("topleft", c("True Density","Default initial value (90% quantile)",
#'  "Prof. lik. for initial value", "Prof. lik. for fixed threshold"),
#'  col=c("black", "red", "purple", "darkgreen"), lty = 1)
#'   
#' # Notice that if tail fraction is included a better fit is obtained
#' fittailfrac = fnormgpdcon(x)
#' 
#' par(mfrow = c(1, 1))
#' hist(x, breaks = 100, freq = FALSE, xlim = c(-4, 4))
#' lines(xx, y)
#' with(fit, lines(xx, dhpdcon(xx, nmean, nsd, u, xi), col="red"))
#' abline(v = fit$u, col = "red")
#' with(fittailfrac, lines(xx, dnormgpdcon(xx, nmean, nsd, u, xi), col="blue"))
#' abline(v = fittailfrac$u)
#' legend("topright", c("Standard Normal", "Hybrid Pareto Continuous", "Normal+GPD Continuous"),
#'   col=c("black", "red", "blue"), lty = 1)
#' }
#' 

# maximum likelihood fitting for hybrid Pareto with single continuity constraint
fhpdcon <- function(x, useq = NULL, fixedu = FALSE, pvector = NULL,
  std.err = TRUE, method = "BFGS", control = list(maxit = 10000), finitelik = TRUE, ...) {

  call <- match.call()
    
  np = 4 # maximum number of parameters

  # Check properties of inputs
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.param(useq, allowvec = TRUE, allownull = TRUE)
  check.logic(fixedu)
  check.logic(std.err)
  check.optim(method)
  check.control(control)
  check.logic(finitelik)

  if (any(!is.finite(x))) {
    warning("non-finite cases have been removed")
    x = x[is.finite(x)] # ignore missing and infinite cases
  }

  check.quant(x)
  n = length(x)

  if ((method == "L-BFGS-B") | (method == "BFGS")) finitelik = TRUE
  
  # useq must be specified if threshold is fixed
  if (fixedu & is.null(useq))
    stop("for fixed threshold approach, useq must be specified (as scalar or vector)")
  
  # Check if profile likelihood or fixed threshold is being used
  # and determine initial values for parameters in each case
  if (is.null(useq)) { # not profile or fixed
    check.nparam(pvector, nparam = np, allownull = TRUE)
    
    if (is.null(pvector)) {
      pvector[1] = mean(x, trim = 0.2)
      pvector[2] = sd(x)
      pvector[3] = as.vector(quantile(x, 0.9))
      initfgpd = fgpd(x, pvector[3], std.err = FALSE)
      pvector[4] = initfgpd$xi
    }
    
  } else { # profile or fixed
    
    check.nparam(pvector, nparam = np - 1, allownull = TRUE)

    # profile likelihood for threshold or scalar given
    if (length(useq) != 1) {
      
      # remove thresholds with less than 5 excesses
      useq = useq[sapply(useq, FUN = function(u, x) sum(x > u) > 5, x = x)]
      check.param(useq, allowvec = TRUE)
      
      nllhu = sapply(useq, profluhpdcon, pvector = pvector, x = x,
        method = method, control = control, finitelik = finitelik, ...)
      
      if (all(!is.finite(nllhu))) stop("thresholds are all invalid")
      u = useq[which.min(nllhu)]

    } else {
      u = useq
    }

    if (fixedu) { # threshold fixed
      if (is.null(pvector)) {
        pvector[1] = mean(x, trim = 0.2)
        pvector[2] = sd(x)
        initfgpd = fgpd(x, u, std.err = FALSE)
        pvector[3] = initfgpd$xi
      }
    } else { # threshold as initial value in usual MLE
      if (is.null(pvector)) {
        pvector[1] = mean(x, trim = 0.2)
        pvector[2] = sd(x)
        pvector[3] = u
        initfgpd = fgpd(x, pvector[3], std.err = FALSE)
        pvector[4] = initfgpd$xi
      } else {
        pvector[4] = pvector[3] # shift GPD shape to add in u
        pvector[3] = u
      }
    }
  }

  if (fixedu) { # fixed threshold (separable) likelihood
    nllh = nluhpdcon(pvector, u, x)
    if (is.infinite(nllh)) {
      pvector[3] = 0.1
      nllh = nluhpdcon(pvector, u, x)    
    }
    if (is.infinite(nllh)) stop("initial parameter values are invalid")
  
    fit = optim(par = as.vector(pvector), fn = nluhpdcon, u = u, x = x,
      finitelik = finitelik, method = method, control = control, hessian = TRUE, ...)    
    
    nmean = fit$par[1]
    nsd = fit$par[2]
    xi = fit$par[3]
    
  } else { # complete (non-separable) likelihood

    nllh = nlhpdcon(pvector, x)
    if (is.infinite(nllh)) {
      pvector[4] = 0.1
      nllh = nlhpdcon(pvector, x)    
    }
    if (is.infinite(nllh)) stop("initial parameter values are invalid")

    fit = optim(par = as.vector(pvector), fn = nlhpdcon, x = x,
      finitelik = finitelik, method = method, control = control, hessian = TRUE, ...)    
    
    nmean = fit$par[1]
    nsd = fit$par[2]
    u = fit$par[3]
    xi = fit$par[4]
  }
  
  du = dnorm(u, nmean, nsd)
  sigmau = 1/du

  r = 1 + pnorm(u, nmean, nsd)
  phiu = 1/r # same normalisation for GPD and normal
  
  conv = TRUE
  if ((fit$convergence != 0) | any(fit$par == pvector) | (abs(fit$value) >= 1e6)) {
    conv = FALSE
    warning("check convergence")
  }

  if (std.err) {
    qrhess = qr(fit$hessian)
    if (qrhess$rank != ncol(qrhess$qr)) {
      warning("observed information matrix is singular")
      se = NULL
      invhess = NULL
    } else {
      invhess = solve(qrhess)
      vars = diag(invhess)
      if (any(vars <= 0)) {
        warning("observed information matrix is singular")
        invhess = NULL
        se = NULL
      } else {
        se = sqrt(vars)
      }  
    }
  } else {
    invhess = NULL
    se = NULL
  }
  
  if (!exists("nllhu")) nllhu = NULL

  list(call = call, x = as.vector(x), 
    init = as.vector(pvector), fixedu = fixedu, useq = useq, nllhuseq = nllhu,
    optim = fit, conv = conv, cov = invhess, mle = fit$par, se = se, rate = phiu,
    nllh = fit$value, n = n,
    nmean = nmean, nsd = nsd, u = u, sigmau = sigmau, xi = xi, phiu = phiu)
}

#' @export
#' @aliases fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon
#' @rdname  fhpdcon

# log-likelihood function for hybrid Pareto with single continuity constraint
lhpdcon <- function(x, nmean = 0, nsd = 1, u = qnorm(0.9, nmean, nsd), 
  xi = 0, log = TRUE) {

  # Check properties of inputs
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.param(nmean)
  check.param(nsd)
  check.param(u)
  check.param(xi)
  check.logic(log)

  if (any(!is.finite(x))) {
    warning("non-finite cases have been removed")
    x = x[is.finite(x)] # ignore missing and infinite cases
  }

  check.quant(x)
  n = length(x)

  check.inputn(c(length(nmean), length(nsd), length(u), length(xi)), allowscalar = TRUE)

  # assume NA or NaN are irrelevant as entire lower tail is now modelled
  # inconsistent with evd library definition
  # hence use which() to ignore these

  xu = x[which(x > u)]
  nu = length(xu)
  xb = x[which(x <= u)]
  nb = length(xb)

  if (n != nb + nu) {
    stop("total non-finite sample size is not equal to those above threshold and those below or equal to it")    
  }

  if ((nsd <= 0) | (u <= min(x)) | (u >= max(x))) {
    l = -Inf
  } else {
    du = dnorm(u, nmean, nsd)
    sigmau = 1/du
  
    r = 1 + pnorm(u, nmean, nsd)

    syu = 1 + xi * (xu - u) / sigmau  
    yb = (xb - nmean) / nsd # used for normal
  
    if ((min(syu) <= 0) | (sigmau <= 0) | (du < .Machine$double.eps)) {
      l = -Inf
    } else { 
      l = lgpd(xu, u, sigmau, xi) # phiu disappears
      l = l - nb * log(2 * pi * nsd ^ 2) / 2 - sum(yb ^ 2) / 2 # phib disappears
      l = l - n * log(r) # divide by normalisation constant
    }
  }
  
  if (!log) l = exp(l)
  
  l
}

#' @export
#' @aliases fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon
#' @rdname  fhpdcon

# negative log-likelihood function for hybrid Pareto with single continuity constraint
# (wrapper for likelihood, inputs and checks designed for optimisation)
nlhpdcon <- function(pvector, x, finitelik = FALSE) {

  np = 4 # maximum number of parameters

  # Check properties of inputs
  check.nparam(pvector, nparam = np)
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.logic(finitelik)

  nmean = pvector[1]
  nsd = pvector[2]
  u = pvector[3]
  xi = pvector[4]

  nllh = -lhpdcon(x, nmean, nsd, u, xi) 
  
  if (finitelik & is.infinite(nllh)) {
    nllh = sign(nllh) * 1e6
  }

  nllh
}

#' @export
#' @aliases fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon
#' @rdname  fhpdcon

# profile negative log-likelihood function for given threshold for
# hybrid Pareto with single continuity constraint
# designed for sapply to loop over vector of thresholds (hence u is first input)
profluhpdcon <- function(u, pvector, x,
  method = "BFGS", control = list(maxit = 10000), finitelik = TRUE, ...) {

  np = 4 # maximum number of parameters

  # Check properties of inputs
  check.nparam(pvector, nparam = np - 1, allownull = TRUE)
  check.param(u)
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.optim(method)
  check.control(control)
  check.optim(method)
  check.control(control)
  check.logic(finitelik)

  if (any(!is.finite(x))) {
    warning("non-finite cases have been removed")
    x = x[is.finite(x)] # ignore missing and infinite cases
  }

  check.quant(x)  
  
  # check initial values for other parameters, try usual alternative
  if (!is.null(pvector)) {
    nllh = nluhpdcon(pvector, u, x)
    
    if (is.infinite(nllh)) pvector = NULL
  }

  if (is.null(pvector)) {
    pvector[1] = mean(x, trim = 0.2)
    pvector[2] = sd(x)
    initfgpd = fgpd(x, u, std.err = FALSE)
    pvector[3] = initfgpd$xi
    nllh = nluhpdcon(pvector, u, x)
  }  

  if (is.infinite(nllh)) {
    pvector[3] = 0.1
    nllh = nluhpdcon(pvector, u, x)    
  }

  # if still invalid then output cleanly
  if (is.infinite(nllh)) {
    warning(paste("initial parameter values for threshold u =", u, "are invalid"))
    fit = list(par = rep(NA, np), value = Inf, counts = 0, convergence = NA, 
      message = "initial values invalid", hessian = rep(NA, np))
  } else {

    fit = optim(par = as.vector(pvector), fn = nluhpdcon, u = u, x = x,
    finitelik = finitelik, method = method, control = control, hessian = TRUE, ...)
  }
    
  if (finitelik & is.infinite(fit$value)) {
    fit$value = sign(fit$value) * 1e6
  }

  fit$value
}

#' @export
#' @aliases fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon
#' @rdname  fhpdcon

# negative log-likelihood function for hybrid Pareto with single continuity constraint
# (wrapper for likelihood, designed for threshold to be fixed and other parameters optimised)
nluhpdcon <- function(pvector, u, x, finitelik = FALSE) {

  np = 4 # maximum number of parameters

  # Check properties of inputs
  check.nparam(pvector, nparam = np - 1)
  check.param(u)
  check.quant(x, allowna = TRUE, allowinf = TRUE)
  check.logic(finitelik)
    
  nmean = pvector[1]
  nsd = pvector[2]
  xi = pvector[3]

  nllh = -lhpdcon(x, nmean, nsd, u, xi) 
  
  if (finitelik & is.infinite(nllh)) {
    nllh = sign(nllh) * 1e6
  }

  nllh
}
Any scripts or data that you put into this service are public.
evmix documentation built on Sept. 3, 2019, 5:07 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
evmix
Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/fhpdcon.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Defines functions fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon

Documented in fhpdcon lhpdcon nlhpdcon nluhpdcon profluhpdcon

Try the evmix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/fhpdcon.r In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Defines functions fhpdcon lhpdcon nlhpdcon profluhpdcon nluhpdcon

Documented in fhpdcon lhpdcon nlhpdcon nluhpdcon profluhpdcon

Try the evmix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

evmix
Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/fhpdcon.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation