# DataPreprocessing: Extremes Data Preprocessing In fExtremes: Rmetrics - Modelling Extreme Events in Finance

## Description

A collection and description of functions for data preprocessing of extreme values. This includes tools to separate data beyond a threshold value, to compute blockwise data like block maxima, and to decluster point process data.

The functions are:

 `blockMaxima` Block Maxima from a vector or a time series, `findThreshold` Upper threshold for a given number of extremes, `pointProcess` Peaks over Threshold from a vector or a time series, `deCluster` Declusters clustered point process data.

## Usage

 ```1 2 3 4``` ```blockMaxima(x, block = c("monthly", "quarterly"), doplot = FALSE) findThreshold(x, n = floor(0.05*length(as.vector(x))), doplot = FALSE) pointProcess(x, u = quantile(x, 0.95), doplot = FALSE) deCluster(x, run = 20, doplot = TRUE) ```

## Arguments

 `block` the block size. A numeric value is interpreted as the number of data values in each successive block. All the data is used, so the last block may not contain `block` observations. If the `data` has a `times` attribute containing (in an object of class `"POSIXct"`, or an object that can be converted to that class, see `as.POSIXct`) the times/dates of each observation, then `block` may instead take the character values `"month"`, `"quarter"`, `"semester"` or `"year"`. By default monthly blocks from daily data are assumed. `doplot` a logical value. Should the results be plotted? By default `TRUE`. `n` a numeric value or vector giving number of extremes above the threshold. By default, `n` is set to an integer representing 5% of the data from the whole data set `x`. `run` parameter to be used in the runs method; any two consecutive threshold exceedances separated by more than this number of observations/days are considered to belong to different clusters. `u` a numeric value at which level the data are to be truncated. By default the threshold value which belongs to the 95% quantile, `u=quantile(x,0.95)`. `x` a numeric data vector from which `findThreshold` and `blockMaxima` determine the threshold values and block maxima values. For the function `deCluster` the argument `x` represents a numeric vector of threshold exceedances with a `times` attribute which should be a numeric vector containing either the indices or the times/dates of each exceedance (if times/dates, the attribute should be an object of class `"POSIXct"` or an object that can be converted to that class; see `as.POSIXct`).

## Details

Computing Block Maxima:

The function `blockMaxima` calculates block maxima from a vector or a time series, whereas the function `blocks` is more general and allows for the calculation of an arbitrary function `FUN` on blocks.

Finding Thresholds:

The function `findThreshold` finds a threshold so that a given number of extremes lie above. When the data are tied a threshold is found so that at least the specified number of extremes lie above.

De-Clustering Point Processes:

The function `deCluster` declusters clustered point process data so that Poisson assumption is more tenable over a high threshold.

## Value

`blockMaxima`
returns a timeSeries object or a numeric vector of block maxima data.

`findThreshold`
returns a numeric value or vector of suitable thresholds.

`pointProcess`
returns a timeSeries object or a numeric vector of peaks over a threshold.

`deCluster`
returns a timeSeries object or a numeric vector for the declustered point process.

## Author(s)

Some of the functions were implemented from Alec Stephenson's R-package `evir` ported from Alexander McNeil's S library `EVIS`, Extreme Values in S, some from Alec Stephenson's R-package `ismev` based on Stuart Coles code from his book, Introduction to Statistical Modeling of Extreme Values and some were written by Diethelm Wuertz.

## References

Coles S. (2001); Introduction to Statistical Modelling of Extreme Values, Springer.

Embrechts, P., Klueppelberg, C., Mikosch, T. (1997); Modelling Extremal Events, Springer.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30``` ``` ## findThreshold - # Threshold giving (at least) fifty exceedances for Danish data: x = as.timeSeries(data(danishClaims)) findThreshold(x, n = c(10, 50, 100)) ## blockMaxima - # Block Maxima (Minima) for left tail of BMW log returns: BMW = as.timeSeries(data(bmwRet)) colnames(BMW) = "BMW.RET" head(BMW) x = blockMaxima( BMW, block = 65) head(x) y = blockMaxima(-BMW, block = 65) head(y) y = blockMaxima(-BMW, block = "monthly") head(y) ## pointProcess - # Return Values above threshold in negative BMW log-return data: PP = pointProcess(x = -BMW, u = quantile(as.vector(x), 0.75)) PP nrow(PP) ## deCluster - # Decluster the 200 exceedances of a particular DC = deCluster(x = PP, run = 15, doplot = TRUE) DC nrow(DC) ```

