Explorative Data Analysis
Description
A collection and description of functions for
explorative data analysis. The tools include
plot functions for emprical distributions, quantile
plots, graphs exploring the properties of exceedences
over a threshold, plots for mean/sum ratio and for
the development of records.
The functions are:
emdPlot  Plot of empirical distribution function, 
qqparetoPlot  Exponential/Pareto quantile plot, 
mePlot  Plot of mean excesses over a threshold, 
mrlPlot  another variant, mean residual life plot, 
mxfPlot  another variant, with confidence intervals, 
msratioPlot  Plot of the ratio of maximum and sum, 
recordsPlot  Record development compared with iid data, 
ssrecordsPlot  another variant, investigates subsamples, 
sllnPlot  verifies Kolmogorov's strong law of large numbers, 
lilPlot  verifies HartmanWintner's law of the iterated logarithm, 
xacfPlot  ACF of exceedences over a threshold, 
normMeanExcessFit  fits mean excesses with a normal density, 
ghMeanExcessFit  fits mean excesses with a GH density, 
hypMeanExcessFit  fits mean excesses with a HYP density, 
nigMeanExcessFit  fits mean excesses with a NIG density, 
ghtMeanExcessFit  fits mean excesses with a GHT density. 
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28  emdPlot(x, doplot = TRUE, plottype = c("xy", "x", "y", " "),
labels = TRUE, ...)
qqparetoPlot(x, xi = 0, trim = NULL, threshold = NULL, doplot = TRUE,
labels = TRUE, ...)
mePlot(x, doplot = TRUE, labels = TRUE, ...)
mrlPlot(x, ci = 0.95, umin = mean(x), umax = max(x), nint = 100, doplot = TRUE,
plottype = c("autoscale", ""), labels = TRUE, ...)
mxfPlot(x, u = quantile(x, 0.05), doplot = TRUE, labels = TRUE, ...)
msratioPlot(x, p = 1:4, doplot = TRUE, labels = TRUE, ...)
recordsPlot(x, ci = 0.95, doplot = TRUE, labels = TRUE, ...)
ssrecordsPlot(x, subsamples = 10, doplot = TRUE, plottype = c("lin", "log"),
labels = TRUE, ...)
sllnPlot(x, doplot = TRUE, labels = TRUE, ...)
lilPlot(x, doplot = TRUE, labels = TRUE, ...)
xacfPlot(x, u = quantile(x, 0.95), lag.max = 15, doplot = TRUE,
which = c("all", 1, 2, 3, 4), labels = TRUE, ...)
normMeanExcessFit(x, doplot = TRUE, trace = TRUE, ...)
ghMeanExcessFit(x, doplot = TRUE, trace = TRUE, ...)
hypMeanExcessFit(x, doplot = TRUE, trace = TRUE, ...)
nigMeanExcessFit(x, doplot = TRUE, trace = TRUE, ...)
ghtMeanExcessFit(x, doplot = TRUE, trace = TRUE, ...)

Arguments
ci 
[recordsPlot]  
doplot 
a logical value. Should the results be plotted? By
default 
labels 
a logical value. Whether or not x and yaxes should be automatically
labelled and a default main title should be added to the plot.
By default 
lag.max 
[xacfPlot]  
nint 
[mrlPlot]  
p 
[msratioPlot]  
plottype 
[emdPlot]  
subsamples 
[ssrecordsPlot]  
threshold, trim 
[qPlot][xacfPlot]  
trace 
a logical flag, by default 
u 
a numeric value at which level the data are to be truncated. By
default the threshold value which belongs to the 95% quantile,

umin, umax 
[mrlPlot]  
which 
[xacfPlot]  
x, y 
numeric data vectors or in the case of x an object to be plotted. 
xi 
the shape parameter of the generalized Pareto distribution. 
... 
additional arguments passed to the FUN or plot function. 
Details
Empirical Distribution Function:
The function emdPlot
is a simple explanatory function. A
straight line on the double log scale indicates Pareto tail behaviour.
Quantileâ€“Quantile Pareto Plot:
qqparetoPlot
creates a quantilequantile plot for threshold
data. If xi
is zero the reference distribution is the
exponential; if xi
is nonzero the reference distribution
is the generalized Pareto with that parameter value expressed
by xi
. In the case of the exponential, the plot is
interpreted as follows: Concave departures from a straight line are a
sign of heavytailed behaviour, convex departures show thintailed
behaviour.
Mean Excess Function Plot:
Three variants to plot the mean excess function are available:
A sample mean excess plot over increasing thresholds, and two mean
excess function plots with confidence intervals for discrimination
in the tails of a distribution.
In general, an upward trend in a mean excess function plot shows
heavytailed behaviour. In particular, a straight line with positive
gradient above some threshold is a sign of Pareto behaviour in tail.
A downward trend shows thintailed behaviour whereas a line with
zero gradient shows an exponential tail. Here are some hints:
Because upper plotting points are the average of a handful of extreme
excesses, these may be omitted for a prettier plot.
For mrlPlot
and mxfPlot
the upper tail is investigated;
for the lower tail reverse the sign of the data
vector.
Plot of the Maximum/Sum Ratio:
The ratio of maximum and sum is a simple tool for detecting heavy
tails of a distribution and for giving a rough estimate of
the order of its finite moments. Sharp increases in the curves
of a msratioPlot
are a sign for heavy tail behaviour.
Plot of the Development of Records:
These are functions that investigate the development of records in
a dataset and calculate the expected behaviour for iid data.
recordsPlot
counts records and reports the observations
at which they occur. In addition subsamples can be investigated
with the help of the function ssrecordsPlot
.
Plot of Kolmogorov's and HartmanWintern's Laws:
The function sllnPlot
verifies Kolmogorov's strong law of
large numbers, and the function lilPlot
verifies
HartmanWintner's law of the iterated logarithm.
ACF Plot of Exceedences over a Thresold:
This function plots the autocorrelation functions of heights and
distances of exceedences over a threshold.
Value
The functions return a plot.
Note
The plots are labeled by default with a xlabel, a ylabel and
a main title. If the argument labels
is set to FALSE
neither a xlabel, a ylabel nor a main title will be added to the
graph. To add user defined label strings just use the
function title(xlab="\dots", ylab="\dots", main="\dots")
.
Author(s)
Some of the functions were implemented from Alec Stephenson's
Rpackage evir
ported from Alexander McNeil's S library
EVIS
, Extreme Values in S, some from Alec Stephenson's
Rpackage ismev
based on Stuart Coles code from his book,
Introduction to Statistical Modeling of Extreme Values and
some were written by Diethelm Wuertz.
References
Coles S. (2001); Introduction to Statistical Modelling of Extreme Values, Springer.
Embrechts, P., Klueppelberg, C., Mikosch, T. (1997); Modelling Extremal Events, Springer.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 
## Danish fire insurance data:
data(danishClaims)
danishClaims = as.timeSeries(danishClaims)
## emdPlot 
# Show Pareto tail behaviour:
par(mfrow = c(2, 2), cex = 0.7)
emdPlot(danishClaims)
## qqparetoPlot 
# QQPlot of heavytailed Danish fire insurance data:
qqparetoPlot(danishClaims, xi = 0.7)
## mePlot 
# Sample mean excess plot of heavytailed Danish fire:
mePlot(danishClaims)
## ssrecordsPlot 
# Record fire insurance losses in Denmark:
ssrecordsPlot(danishClaims, subsamples = 10)
