R/mrlplot.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Documented in mrlplot

#' @export
#' 
#' @title Mean Residual Life Plot
#'
#' @description Plots the sample mean residual life (MRL) plot.
#'
#' @param data       vector of sample data
#' @param tlim       vector of (lower, upper) limits of range of threshold
#'                   to plot MRL, or \code{NULL} to use default values
#' @param nt         number of thresholds for which to evaluate MRL
#' @param alpha      significance level over range (0, 1), or \code{NULL} for no CI
#' @param p.or.n     logical, should tail fraction (\code{FALSE}) or number of
#'                   exceedances (\code{TRUE}) be given on upper x-axis
#' @param ylim       y-axis limits or \code{NULL}
#' @param legend.loc location of legend (see \code{\link[graphics:legend]{legend}}) or \code{NULL} for no legend
#' @param try.thresh vector of thresholds to consider
#' @param main       title of plot
#' @param xlab       x-axis label
#' @param ylab       y-axis label
#' @param ...        further arguments to be passed to the plotting functions
#' 
#' @details Plots the sample mean residual life plot, which is also known as the mean
#' excess plot. 
#' 
#' If the generalised Pareto distribution (GPD) is an appropriate model for the excesses \eqn{X-u}
#' above \eqn{u} then their expected value is:
#' \deqn{E(X - u | X > u) = \sigma_u / (1 - \xi).}
#' For any higher threshold \eqn{v > u} the expected value is 
#' \deqn{E(X - v | X > v) = [\sigma_u + \xi * (v - u)] / (1 - \xi)}
#' which is linear in higher thresholds \eqn{v} with intercept given by \eqn{[\sigma_u - \xi *u]/(1 - \xi)}
#' and gradient \eqn{\xi/(1 - \xi)}. The estimated mean residual life above a threshold
#' \eqn{v} is given by the sample mean excess \code{mean(x[x > v]) - v}. 
#' 
#' Symmetric CLT based confidence intervals are provided, provided there are at least 5 exceedances.
#' The sampling density for the MRL is shown by a greyscale image, where lighter greys indicate low density.
#' 
#' A pre-chosen threshold (or more than one) can be given in \code{try.thresh}. The GPD is
#' fitted to the excesses using maximum likelihood estimation. The estimated parameters are
#' used to plot the linear function for all higher thresholds using a solid line. The threshold
#' should set as low as possible, so a dashed line is shown below the pre-chosen threshold.
#' If the MRL is similar to the dashed line then a lower threshold may be chosen.
#' 
#' If no threshold limits are provided \code{tlim = NULL} then the lowest threshold is set
#' to be just below the median data point and the maximum threshold is set to the 6th
#' largest datapoint.
#' 
#' The range of permitted thresholds is just below the minimum datapoint and the
#' second largest value. If there are less unique values of data within the threshold
#' range than the number of threshold evalations requested, then instead of a sequence
#' of thresholds the MRL will be evaluated at each unique datapoint.
#' 
#' The missing (\code{NA} and \code{NaN}) and non-finite values are ignored.
#' 
#' The lower x-axis is the threshold and an upper axis either gives the number of 
#' exceedances (\code{p.or.n = FALSE}) or proportion of excess (\code{p.or.n = TRUE}).
#' Note that unlike the \code{gpd} related functions the missing values are ignored, so
#' do not add to the lower tail fraction. But ignoring the missing values is consistent
#' with all the other mixture model functions.
#' 
#' @return \code{\link[evmix:mrlplot]{mrlplot}} gives the mean residual life plot. It also
#' returns a matrix containing columns of the threshold, number of exceedances, mean excess,
#' standard devation of excesses and \eqn{100(1 - \alpha)\%} confidence interval if requested. The standard
#' deviation and confidence interval are \code{NA} for less than 5 exceedances.
#' 
#' @note If the user specifies the threshold range, the thresholds above the second
#' largest are dropped. A warning message is given if any thresholds have at most 5 
#' exceedances, in which case the confidence interval is not calculated as it is
#' unreliable due to small sample. If there are less than 10 exceedances of the minimum
#' threshold then the function will stop.
#' 
#' Error checking of the inputs (e.g. invalid probabilities) is carried out and
#' will either stop or give warning message as appropriate.
#' 
#' @references
#' 
#' Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value
#' threshold estimation and uncertainty quantification. REVSTAT - Statistical
#' Journal 10(1), 33-59. Available from \url{http://www.ine.pt/revstat/pdf/rs120102.pdf}
#' 
#' Coles S.G. (2004). An Introduction to the Statistical Modelling of Extreme Values.
#' Springer-Verlag: London.
#' 
#' @author Yang Hu and Carl Scarrott \email{carl.scarrott@@canterbury.ac.nz}
#'
#' @section Acknowledgments: Based on the 
#' \code{\link[evd:mrlplot]{mrlplot}} function in the 
#' \code{\link[evd:mrlplot]{evd}} package for which Stuart Coles' and Alec Stephenson's contributions are gratefully acknowledged.
#' They are designed to have similar syntax and functionality to simplify the transition for users of these packages.
#' 
#' @seealso \code{\link[evmix:gpd]{gpd}} and \code{\link[evd:mrlplot]{mrlplot}} from 
#' \code{\link[evd:mrlplot]{evd}} library
#' 
#' @examples
#' x = rnorm(1000)
#' mrlplot(x)
#' mrlplot(x, tlim = c(0, 2.2))
#' mrlplot(x, tlim = c(0, 2), try.thresh = c(0.5, 1, 1.5))
#' mrlplot(x, tlim = c(0, 3), try.thresh = c(0.5, 1, 1.5))
#' 

mrlplot <- function(data, tlim = NULL, nt = min(100, length(data)), p.or.n = FALSE,
  alpha = 0.05, ylim = NULL, legend.loc = "bottomleft",
  try.thresh = quantile(data, 0.9, na.rm = TRUE), 
  main = "Mean Residual Life Plot", xlab = "Threshold u", ylab = "Mean Excess", ...) {

  # make sure defaults which result from function evaluations are obtained
  invisible(nt)
  invisible(try.thresh)
  check.quant(data, allowna = TRUE, allowinf = TRUE)
  
  check.param(tlim, allowvec = TRUE, allownull = TRUE)
  if (!is.null(tlim)) {
    if (length(tlim) != 2) 
      stop("threshold range tlim must be a numeric vector of length 2")

    if (tlim[2] <= tlim[1])
      stop("a range of thresholds must be specified by tlim")
  }
  
  check.logic(p.or.n)
  check.n(nt)
  if (nt == 1)
    stop("number of thresholds must be a non-negative integer >= 2")

  check.prob(alpha, allownull = TRUE)
  if (!is.null(alpha)){
    if (alpha <= 0 | alpha >= 1)
      stop("significance level alpha must be between (0, 1)")
  }
  
  check.param(ylim, allowvec = TRUE, allownull = TRUE)
  if (!is.null(ylim)) {
    if (length(ylim) != 2) 
      stop("ylim must be a numeric vector of length 2")

    if (ylim[2] <= ylim[1])
      stop("a range of y axis limits must be specified by ylim")
  }
  
  check.text(legend.loc, allownull = TRUE)
  if (!is.null(legend.loc)) {
    if (!(legend.loc %in% c("bottomright", "bottom", "bottomleft", "left",
      "topleft", "top", "topright", "right", "center")))
      stop("legend location not correct, see help(legend)")
  }
  
  if (any(!is.finite(data))) warning("non-finite data values have been removed")

  data = data[which(is.finite(data))]
  if (is.unsorted(data)) {
    data = sort(data)
  } else {
    if (data[1] > data[length(data)])
      data = rev(data)
  }
  check.quant(data)

  if (is.null(tlim)) {
    thresholds = seq(median(data) - 2*.Machine$double.eps, data[length(data) - 6], length.out = nt)
  } else {
    thresholds = seq(tlim[1], tlim[2], length.out = nt)
  }

  n = length(data)
  data = data[data > min(thresholds)]

  # Trick to evaluate MRL at all datapoints if there are not too many
  udata = unique(data)
  if (length(udata) <= nt) {
    warning("less data than number of thresholds requested, so will use unique data as thresholds")
    thresholds = udata[-length(udata)]
  }

  # Check given thresholds
  nminu = sum(data > min(thresholds))
  if (nminu <= 10)
    stop("data must have more than 10 exceedances of lowest threshold")
  
  nmaxu = sum(data > max(thresholds))
  if (nmaxu == 0) {
    warning("thresholds above max of input data are dropped")
    thresholds = thresholds[thresholds <= max(data)]
    nmaxu = sum(data > max(thresholds))
  }
  if (nmaxu <= 5)
    warning("confidence intervals are not shown where there are less than 5 exceedances")

  nt = length(thresholds)
  if (nt < 2)
    stop("must be more than 1 threshold")

  if (!is.null(try.thresh)) {
    if (length(try.thresh) == 0 | mode(try.thresh) != "numeric") 
      stop("threshold to fit GPD to must be numeric scalar or vector")

    if (any((try.thresh < min(thresholds)) | (try.thresh >= max(thresholds))))
      stop("potential thresholds must be within range specifed by tlim")
  }
    
  me.calc <- function(u, x, alpha) {
    excesses = x[x > u] - u
    nxs = length(excesses)
    meanxs = mean(excesses)
    sdxs = ifelse(nxs <= 5, NA, sd(excesses))
    
    results = c(u, nxs, meanxs, sdxs)
    if (!is.null(alpha)) {
      results = c(results, meanxs + qnorm(c(alpha/2, 1 - alpha/2)) * sdxs/sqrt(nxs))
    }
  
    return(results)
  }
  me = t(sapply(thresholds, FUN = me.calc, x = data, alpha = alpha))
  me = as.data.frame(me)
  if (!is.null(alpha)) {
    names(me) = c("u", "nu", "mean.excess", "sd.excess", "cil.excess", "ciu.excess")
  } else {
    names(me) = c("u", "nu", "mean.excess", "sd.excess")    
  }
  
  # if CI requested then fancy plot, otherwise give usual MRL
  par(mar = c(5, 4, 7, 2) + 0.1)
  if (!is.null(alpha)) {
    # Assume CLT holds for mean excess, calculate density over all thresholds
    mes = range(me[, 5:6], na.rm = TRUE)
    merange = seq(mes[1] - (mes[2] - mes[1])/10, mes[2] + (mes[2] - mes[1])/10, length.out = 200)
    allmat = matrix(merange, nrow = nt, ncol = 200, byrow = TRUE)
    memat = matrix(me[, 3], nrow = nt, ncol = 200, byrow = FALSE)
    sdmat = matrix(me[, 4]/sqrt(me[, 2]), nrow = nt, ncol = 200, byrow = FALSE)
    z = (allmat - memat)/sdmat
    z[abs(z) > 3] = NA

    if (is.null(ylim)) {
      ylim = range(merange, na.rm = TRUE)
      ylim = ylim + c(-1, 1) * diff(ylim)/10
    }
    image(thresholds, merange, dnorm(z), col = gray(seq(1, 0.3, -0.01)),
          main = main, xlab = xlab, ylab = ylab, ylim = ylim, ...)
    matplot(matrix(thresholds, nrow = nt, ncol = 3, byrow = FALSE), me[, c(3, 5, 6)], add = TRUE,
            type = "l", lty = c(1, 2, 2), col = "black", lwd = c(2, 1, 1), ...)    
  } else {
    if (is.null(ylim)) {
      ylim = range(me[, 3], na.rm = TRUE)
      ylim = ylim + c(-1, 1) * diff(ylim)/10
    }
    plot(thresholds, me[, 3], main = main, xlab = xlab, ylab = ylab, ylim = ylim, add = TRUE,
      type = "l", lty = 1, col = "black", lwd = 2, ...)    
  }
  box()

  naxis = rev(ceiling(2^pretty(log2(c(nmaxu, nminu)), 10)))
  naxis = naxis[(naxis > nmaxu) & (naxis < nminu)]
  nxaxis = c(min(thresholds), rev(data)[naxis+1], max(thresholds))
  naxis = c(nminu, naxis, nmaxu) 

  if ((nxaxis[length(nxaxis)] - nxaxis[length(nxaxis) - 1]) < diff(range(thresholds))/10) {
    nxaxis = nxaxis[-(length(nxaxis) - 1)]
    naxis = naxis[-(length(naxis) - 1)]
  }
  if ((nxaxis[2] - nxaxis[1]) < diff(range(thresholds))/20) {
    nxaxis = nxaxis[-2]
    naxis = naxis[-2]
  }

  if (p.or.n) {
    axis(side = 3, at = nxaxis, line = 0, labels = formatC(naxis/n, digits = 2, format = "g"))
    mtext("Tail Fraction phiu", side = 3, line = 2)
  } else {
    axis(side = 3, at = nxaxis, line = 0, labels = naxis)    
    mtext("Number of Excesses", side = 3, line = 2)
  }

  if (!is.null(try.thresh)) {
    ntry = length(try.thresh)
    mleparams = matrix(NA, nrow = 2, ncol = ntry)
    linecols = rep(c("blue", "green", "red"), length.out = ntry)
    for (i in 1:ntry) {
      # fit GPD above threshold using MLE
      fitresults = fgpd(data, try.thresh[i], std.err = FALSE)
      mleparams[, i] = fitresults$mle
      
      # use MLE to estimate intercept and gradient of MRL 
      mrlint = (fitresults$mle[1] - fitresults$mle[2] * try.thresh[i])/(1 - fitresults$mle[2])
      mrlgrad = fitresults$mle[2]/(1 - fitresults$mle[2])
    
      # Suppose to be linear above suitable threshold, different line type before and after
      lines(c(try.thresh[i], max(thresholds)), 
        mrlint + mrlgrad * c(try.thresh[i], max(thresholds)), lwd = 2, lty = 1, col = linecols[i])
      lines(c(min(thresholds), try.thresh[i]), 
        mrlint + mrlgrad * c(min(thresholds), try.thresh[i]), lwd = 2, lty = 2, col = linecols[i])
      abline(v = try.thresh[i], lty = 3, col = linecols[i])
    }
    if (!is.null(legend.loc)) {
      if (!is.null(alpha)) {
        legend(legend.loc, c("Sample Mean Excess", paste(100*(1 - alpha), "% CI"),
          paste("u =", formatC(try.thresh[1:min(c(3, ntry))], digits = 2, format = "g"),
            "sigmau =", formatC(mleparams[1, 1:min(c(3, ntry))], digits = 2, format = "g"),
            "xi =", formatC(mleparams[2, 1:min(c(3, ntry))], digits = 2, format = "g"))),
          lty = c(1, 2, rep(1, min(c(3, ntry)))),
          lwd = c(2, 1, rep(1, min(c(3, ntry)))),
          col = c("black", "black", linecols), bg = "white")
      } else {
        legend(legend.loc, c("Sample Mean Excess",
          paste("u =", formatC(try.thresh[1:min(c(3, ntry))], digits = 2, format = "g"),
            "sigmau =", formatC(mleparams[1, 1:min(c(3, ntry))], digits = 2, format = "g"),
            "xi =", formatC(mleparams[2, 1:min(c(3, ntry))], digits = 2, format = "g"))),
          lty = c(1, rep(1, min(c(3, ntry)))), lwd = c(2, rep(1, min(c(3, ntry)))),
          col = c("black", linecols), bg = "white")
      }
    }
  } else {
    if (!is.null(legend.loc)) {
      if (!is.null(alpha)) {
        legend(legend.loc, c("Sample Mean Excess", paste(100*(1 - alpha), "% CI")),
          lty = c(1, 2), lwd = c(2, 1), bg = "white")
      } else {
        legend(legend.loc, "Sample Mean Excess", lty = 1, lwd = 2, bg = "white")        
      }
    }
  }
  
  invisible(me)
}

Any scripts or data that you put into this service are public.

evmix documentation built on Sept. 3, 2019, 5:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

evmix
Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/mrlplot.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Defines functions mrlplot

Documented in mrlplot

Try the evmix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

evmix Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/mrlplot.r In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Defines functions mrlplot

Documented in mrlplot

Try the evmix package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

evmix
Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

R/mrlplot.r
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation