evmix.diag: Diagnostic Plots for Extreme Value Mixture Models
In evmix: Extreme Value Mixture Modelling, Threshold Estimation and Boundary Corrected Kernel Density Estimation

Description Usage Arguments Details Value Acknowledgments Note Author(s) References See Also Examples

The classic four diagnostic plots for evaluating extreme value mixture models: 1) return level plot, 2) Q-Q plot, 3) P-P plot and 4) density plot. Each plot is available individually or as the usual 2x2 collection.

evmix.diag(modelfit, upperfocus = TRUE, alpha = 0.05, N = 1000,
  legend = FALSE, ...)

rlplot(modelfit, upperfocus = TRUE, alpha = 0.05, N = 1000,
  legend = TRUE, rplim = NULL, rllim = NULL, ...)

qplot(modelfit, upperfocus = TRUE, alpha = 0.05, N = 1000,
  legend = TRUE, ...)

pplot(modelfit, upperfocus = TRUE, alpha = 0.05, N = 1000,
  legend = TRUE, ...)

densplot(modelfit, upperfocus = TRUE, legend = TRUE, ...)

`modelfit`	fitted extreme value mixture model object
`upperfocus`	logical, should plot focus on upper tail?
`alpha`	significance level over range (0, 1), or `NULL` for no CI
`N`	number of Monte Carlo simulation for CI (N>=10)
`legend`	logical, should legend be included
`...`	further arguments to be passed to the plotting functions
`rplim`	return period range
`rllim`	return level range

Model diagnostics are available for all the fitted extreme mixture models in the evmix package. These modelfit is output by all the fitting functions, e.g. fgpd and fnormgpd.

Consistent with plot function in the evd library the ppoints to estimate the empirical cumulative probabilities. The default behaviour of this function is to use

(i-0.5)/n

as the estimate for the ith order statistic of the given sample of size n.

The return level plot has the quantile (q where P(X ≥ q)=p on the y-axis, for a particular survival probability p. The return period t=1/p is shown on the x-axis. The return level is given by:

q = u + σ_u [(φ_u t)^ξ - 1]/ξ

for ξ\ne 0. But in the case of ξ = 0 this simplifies to

q = u + σ_u log(φ_u t)

which is linear when plotted against the return period on a logarithmic scale. The special case of exponential/Type I (ξ=0) upper tail behaviour will be linear on this scale. This is the same tranformation as in the GPD/POT diagnostic plot function plot.uvevd in the evd package, from which these functions were derived.

The crosses are the empirical quantiles/return levels (i.e. the ordered sample data) against their corresponding transformed empirical return period (from ppoints). The solid line is the theoretical return level (quantile) function using the estimated parameters. The estimated threshold u and tail fraction phiu are shown. For the two tailed models both thresholds ul and ur and corresponding tail fractions phiul and phiur are shown. The approximate pointwise confidence intervals for the quantiles are obtained by Monte Carlo simulation using the estimated parameters. Notice that these intervals ignore the parameter estimation uncertainty.

The Q-Q and P-P plots have the empirical values on the y-axis and theoretical values from the fitted model on the x-axis.

The density plot provides a histogram of the sample data overlaid with the fitted density and a standard kernel density estimate using the density function. The default settings for the density function are used. Note that for distributions with bounded support (e.g. GPD) with high density near the boundary standard kernel density estimators exhibit a negative bias due to leakage past the boundary. So in this case they should not be taken too seriously.

For the kernel density estimates (i.e. kden and bckden) there is no threshold, so no upper tail focus is carried out.

See plot.uvevd for more detailed explanations of these types of plots.

rlplot gives the return level plot, qplot gives the Q-Q plot, pplot gives the P-P plot, densplot gives density plot and evmix.diag gives the collection of all 4.

Based on the GPD/POT diagnostic function plot.uvevd in the evd package for which Stuart Coles' and Alec Stephenson's contributions are gratefully acknowledged. They are designed to have similar syntax and functionality to simplify the transition for users of these packages.

For all mixture models the missing values are removed by the fitting functions (e.g. fnormgpd and fgng). However, these are retained in the GPD fitting fgpd, as they are interpreted as values below the threshold.

By default all the plots focus in on the upper tail, but they can be used to display the fit over the entire range of support.

You cannot pass xlim or ylim to the plotting functions via ...

Error checking of the inputs (e.g. invalid probabilities) is carried out and will either stop or give warning message as appropriate.

Yang Hu and Carl Scarrott carl.scarrott@canterbury.ac.nz

http://en.wikipedia.org/wiki/Q-Q_plot

http://en.wikipedia.org/wiki/P-P_plot

Scarrott, C.J. and MacDonald, A. (2012). A review of extreme value threshold estimation and uncertainty quantification. REVSTAT - Statistical Journal 10(1), 33-59. Available from http://www.ine.pt/revstat/pdf/rs120102.pdf

Coles S.G. (2004). An Introduction to the Statistical Modelling of Extreme Values. Springer-Verlag: London.

ppoints, plot.uvevd and gpd.diag.

## Not run: 
set.seed(1)

x = sort(rnorm(1000))
fit = fnormgpd(x)
evmix.diag(fit)

# repeat without focussing on upper tail
par(mfrow=c(2,2))
rlplot(fit, upperfocus = FALSE)
qplot(fit, upperfocus = FALSE)
pplot(fit, upperfocus = FALSE)
densplot(fit, upperfocus = FALSE)

## End(Not run)