regkienerLX: Regression Function for Kiener Distributions

Description Usage Arguments Details Value See Also Examples

View source: R/p_regression.R

Description

One function to estimate the parameters of Kiener distributions K1, K2, K3 and K4 and display the results in a list with many data.frame ready to use for plotting. This function performs an unweighted nonlinear regression of the logit of the empirical probabilities logit(p) on the quantiles X.

Usage

1
2
3
regkienerLX(X, model = "K4", pdgts = c(3, 3, 1, 1, 1, 3, 2, 4, 4, 2, 2),
  maxk = 10, mink = 0.2, app = 0, probak = pprobs2, dgts = NULL,
  exfitk = NULL)

Arguments

X

vector of quantiles.

model

the model used for the regression: "K1", "K2", "K3", "K4".

pdgts

vector of length 11. Control the rounding of output parameters.

maxk

numeric. The maximum value of tail parameter k.

mink

numeric. The minimum value of tail parameter k.

app

numeric. The parameter "a" in the function ppoints.

probak

vector of probabilities used in output regk$fitk. For instance pprobs0.

dgts

rounding parameter applied globally to output regk$fitk.

exfitk

character. A vector of parameter names to subset regk$fitk. For instance exfit0.

Details

This function is designed to estimate the parameters of Kiener distributions for a given dataset. It encapsulates the four distributions described in this package. "K1" uses model lqkiener1, "K2" uses model lqkiener2, "K3" uses model lqkiener3 and "K4" uses model lqkiener4.

A typical input is a numeric vector that describes the returns of a stock. Conversion from a (possible) time series format to a sorted numeric vector is done automatically and without any check of the initial format. There is also no check of missing values, Na, NaN, -Inf, +Inf. Empirical probabilities of each point in the sorted dataset is calculated with the function ppoints. The parameter app corresponds to the parameter a in ppoints but has been limited to the range (0, 0.5). Default value is 0 as large datasets are very common in finance.

A nonlinear regression is performed with nlsLM from the logit of the probabilities logit(p) over the quantiles X with one of the functions lqkiener1234. These functions have been selected as they have an explicit form in the four types (this is unfortunately not the case for dkiener234) and return satisfactory results with ordinary least squares. The median is calculated before the regression and is injected as a mandatory value in the regression function.

Kiener distributions use the following parameters, some of them being redundant. See aw2k and pk2pk for the formulas and the conversion between parameters:

Where:

Model "K1" return results with 1+2=3 parameters and describes a (assumed) symmetric distribution. Parameters d and e are set to 0. Models "K2", "K3" and "K4" describe asymmetric distributions. They return results with 1+3=4 parameters. Model "K2" has a very clear parameter definition but unfortunately parameters a and w are highly correlated. Model "K3" has the least correlated parameters but the meaning of the distortion parameter d, usually of order 1e-3, is not simple.

Model "K4" exhibits a reasonable correlation between each parameter and should be the preferred intermediate model between "K1" and "K2" models. The eccentricity parameter e is well defined and easy to understand: e=(a-w)/(a+w), a=k/(1-e) and w=k/(1+e). It varies between -1 and +1 and can be understood as a percentage (if times 100) of eccentricty. e = -1 corresponds to w = infinity, e = +1 corresponds to a = infinity and the model becomes a single log-logistic funtion with a right / left stopping point and a left / right tail.

Tail parameter lower and upper values are controlled by maxk and mink. An upper value maxk = 10 is appropriate for datasets of low and medium size, less than 50.000 points. For larger datasets, the upper limit can be extended up to maxk = 20. Such a limit returns results which are very closed to the logistic distribution, an alternate distribution which could be more appropriate. The lower limit mink is intended to avoid the value k=0. Remind that value k < 2 describes distribution with no stable variance and k < 1 describes distribution with no stable mean.

The output is an object in a flat format of class clregk. It can be listed with the function attributes.

Value

dfrXP

data.frame. X = initial quantiles. P = empirical probabilities.

dfrXL

data.frame. X = initial quantiles. L = logit of probabilities.

dfrXR

data.frame. X = initial quantiles. R = residuals after regression.

dfrEP

data.frame. E = estimated quantiles. P = probabilities.

dfrEL

data.frame. E = estimated quantiles. L = logit of probabilities.

dfrED

data.frame. E = estimated quantiles. D = estimated density (from probabilities).

regk0

object of class nls extracted from the regression function nlsLM.

coefk0

the regression parameters in plain format. The median is out of the regression.

vcovk0

rounded variance-covariance matrix.

vcovk0m

rounded 1e+6 times variance-covariance matrix.

mcork0

rounded correlation matrix.

coefk

all parameters in plain format.

coefk1

parameters for model "K1".

coefk2

parameters for model "K2".

coefk3

parameters for model "K3".

coefk4

parameters for model "K4".

quantk

quantiles of interest.

coefr

all parameters in a rounded format.

coefr1

rounded parameters for model "K1".

coefr2

rounded parameters for model "K2".

coefr3

rounded parameters for model "K3".

coefr4

rounded parameters for model "K4".

quantr

quantiles of interest in a rounded format.

dfrQkPk

data.frame. Qk = Estimated quantiles of interest. Pk = probabilities.

dfrQkLk

data.frame. Qk = Estimated quantiles of interest. Lk = Logit of probabilities.

dfrESkPk

data.frame. ESk = Estimated Expected Shortfall. Pk = probabilities.

dfrESkLk

data.frame. ESk = Estimated Expected Shortfall. Lk = Logit of probabilities.

fitk

Parameters, quantiles, moments, VaR, ES and other parameters (not rounded). Length of fitk depends on the choice applied to probak. IMPORTANT : if you need to subset fitk, always subset it by parameter names and never subset it by rank number as new items may be added in the future. Use for instance exfit0, ..., exfit7.

See Also

nlsLM, laplacegaussnorm, Kiener distributions K1, K2, K3 and K4: kiener1 kiener2, kiener3, kiener4. Other estimation function: fitkienerX and its derivatives. fitk subsetting: exfit0.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
    

require(graphics)
require(minpack.lm)
require(timeSeries)

### Load the datasets and select one number (1-16)
DS     <- getDSdata()
j      <- 5


### and run this block
X      <- DS[[j]]
nameX  <- names(DS)[j]
reg    <- regkienerLX(X)

## Plotting
lleg   <- c("logit(0.999) = 6.9", "logit(0.99)   = 4.6", 
           "logit(0.95)   = 2.9", "logit(0.50)   = 0", 
           "logit(0.05)   = -2.9", "logit(0.01)   = -4.6", 
           "logit(0.001) = -6.9  ")
pleg   <- c( paste("m =",  reg$coefr4[1]), paste("g  =", reg$coefr4[2]), 
             paste("k  =", reg$coefr4[3]), paste("e  =", reg$coefr4[4]) )
op     <- par(mfrow=c(2,2), mgp=c(1.5,0.8,0), mar=c(3,3,2,1))
plot(X, type="l", main = nameX)
plot(reg$dfrXL, main = nameX, yaxt = "n")
axis(2, las=1, at=c(-9.2, -6.9, -4.6, -2.9, 0, 2.9, 4.6, 6.9, 9.2))
abline(h = c(-4.6, 4.6), lty = 4)
abline(v = c(reg$quantk[5], reg$quantk[9]), lty = 4)
legend("topleft", legend = lleg, cex = 0.7, inset = 0.02, bg = "#FFFFFF")
lines(reg$dfrEL, col = 2, lwd = 2)
points(reg$dfrQkLk, pch = 3, col = 2, lwd = 2, cex = 1.5)
plot(reg$dfrXP, main = nameX)
legend("topleft", legend = pleg, cex = 0.9, inset = 0.02 )
lines(reg$dfrEP, col = 2, lwd = 2)
plot(density(X), main = nameX)
lines(reg$dfrED, col = 2, lwd = 2)
round(cbind("k" = kmoments(reg$coefk, lengthx = nrow(reg$dfrXL)), "X" = xmoments(X)), 2)

## Attributes
attributes(reg)
head(reg$dfrXP)
head(reg$dfrXL)
head(reg$dfrXR)
head(reg$dfrEP)
head(reg$dfrEL)
head(reg$dfrED)
reg$regk0
reg$coefk0
reg$vcovk0
reg$vcovk0m
reg$mcork0
reg$coefk
reg$coefk1
reg$coefk2
reg$coefk3
reg$coefk4
reg$quantk
reg$coefr
reg$coefr1
reg$coefr2
reg$coefr3
reg$coefr4
reg$quantr
reg$dfrQkPk
reg$dfrQkLk
reg$dfrESkPk
reg$dfrESkLk
reg$fitk

## subset fitk
names(reg$fitk)
reg$fitk[exfit6]
reg$fitk[c(exfit1, exfit4)]
### End block

FatTailsR documentation built on March 12, 2021, 9:06 a.m.