Extremes Data Preprocessing

Description

A collection and description of functions for data preprocessing of extreme values. This includes tools to separate data beyond a threshold value, to compute blockwise data like block maxima, and to decluster point process data.

The functions are:

blockMaxima Block Maxima from a vector or a time series,
findThreshold Upper threshold for a given number of extremes,
pointProcess Peaks over Threshold from a vector or a time series,
deCluster Declusters clustered point process data.

Usage

1
2
3
4
blockMaxima(x, block = c("monthly", "quarterly"), doplot = FALSE)
findThreshold(x, n = floor(0.05*length(as.vector(x))), doplot = FALSE)
pointProcess(x, u = quantile(x, 0.95), doplot = FALSE)
deCluster(x, run = 20, doplot = TRUE)

Arguments

block

the block size. A numeric value is interpreted as the number of data values in each successive block. All the data is used, so the last block may not contain block observations. If the data has a times attribute containing (in an object of class "POSIXct", or an object that can be converted to that class, see as.POSIXct) the times/dates of each observation, then block may instead take the character values "month", "quarter", "semester" or "year". By default monthly blocks from daily data are assumed.

doplot

a logical value. Should the results be plotted? By default TRUE.

n

a numeric value or vector giving number of extremes above the threshold. By default, n is set to an integer representing 5% of the data from the whole data set x.

run

parameter to be used in the runs method; any two consecutive threshold exceedances separated by more than this number of observations/days are considered to belong to different clusters.

u

a numeric value at which level the data are to be truncated. By default the threshold value which belongs to the 95% quantile, u=quantile(x,0.95).

x

a numeric data vector from which findThreshold and blockMaxima determine the threshold values and block maxima values. For the function deCluster the argument x represents a numeric vector of threshold exceedances with a times attribute which should be a numeric vector containing either the indices or the times/dates of each exceedance (if times/dates, the attribute should be an object of class "POSIXct" or an object that can be converted to that class; see as.POSIXct).

Details

Computing Block Maxima:

The function blockMaxima calculates block maxima from a vector or a time series, whereas the function blocks is more general and allows for the calculation of an arbitrary function FUN on blocks.

Finding Thresholds:

The function findThreshold finds a threshold so that a given number of extremes lie above. When the data are tied a threshold is found so that at least the specified number of extremes lie above.

De-Clustering Point Processes:

The function deCluster declusters clustered point process data so that Poisson assumption is more tenable over a high threshold.

Value

blockMaxima
returns a timeSeries object or a numeric vector of block maxima data.

findThreshold
returns a numeric value or vector of suitable thresholds.

pointProcess
returns a timeSeries object or a numeric vector of peaks over a threshold.

deCluster
returns a timeSeries object or a numeric vector for the declustered point process.

Author(s)

Some of the functions were implemented from Alec Stephenson's R-package evir ported from Alexander McNeil's S library EVIS, Extreme Values in S, some from Alec Stephenson's R-package ismev based on Stuart Coles code from his book, Introduction to Statistical Modeling of Extreme Values and some were written by Diethelm Wuertz.

References

Coles S. (2001); Introduction to Statistical Modelling of Extreme Values, Springer.

Embrechts, P., Klueppelberg, C., Mikosch, T. (1997); Modelling Extremal Events, Springer.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
 
## findThreshold -
   # Threshold giving (at least) fifty exceedances for Danish data:
   x = as.timeSeries(data(danishClaims))
   findThreshold(x, n = c(10, 50, 100))    
   
## blockMaxima -
   # Block Maxima (Minima) for left tail of BMW log returns:
   BMW = as.timeSeries(data(bmwRet))
   colnames(BMW) = "BMW.RET"
   head(BMW)
   x = blockMaxima( BMW, block = 65)
   head(x)
   y = blockMaxima(-BMW, block = 65)    
   head(y) 
   y = blockMaxima(-BMW, block = "monthly")    
   head(y)

   
## pointProcess -
   # Return Values above threshold in negative BMW log-return data:
   PP = pointProcess(x = -BMW, u = quantile(as.vector(x), 0.75))
   PP
   nrow(PP)
 
## deCluster -
   # Decluster the 200 exceedances of a particular  
   DC = deCluster(x = PP, run = 15, doplot = TRUE) 
   DC
   nrow(DC)