regkienerLX: Regression Function for Kiener Distributions
In FatTailsR: Kiener Distributions and Fat Tails in Finance and Neuroscience

regkienerLX

R Documentation

Regression Function for Kiener Distributions

Description

One function to estimate the parameters of Kiener distributions K1, K2, K3 and K4 and display the results in a list with many data.frame ready to use for plotting. This function performs an unweighted nonlinear regression of the logit of the empirical probabilities logit(p) on the quantiles X.

Usage

regkienerLX(X, model = "K4", pdgts = c(3, 3, 1, 1, 1, 3, 2, 4, 4, 2, 2),
  maxk = 10, mink = 0.2, app = 0, probak = pprobs2, dgts = NULL,
  exfitk = NULL)

Arguments

`X`	vector of quantiles.
`model`	the model used for the regression: "K1", "K2", "K3", "K4".
`pdgts`	vector of length 11. Control the rounding of output parameters.
`maxk`	numeric. The maximum value of tail parameter `k`.
`mink`	numeric. The minimum value of tail parameter `k`.
`app`	numeric. The parameter "`a`" in the function `ppoints`.
`probak`	vector of probabilities used in output regk$fitk. For instance `pprobs0`.
`dgts`	rounding parameter applied globally to output regk$fitk.
`exfitk`	character. A vector of parameter names to subset regk$fitk. For instance `exfit0`.

Details

This function is designed to estimate the parameters of Kiener distributions for a given dataset. It encapsulates the four distributions described in this package. "K1" uses model lqkiener1, "K2" uses model lqkiener2, "K3" uses model lqkiener3 and "K4" uses model lqkiener4.

A typical input is a numeric vector that describes the returns of a stock. Conversion from a (possible) time series format to a sorted numeric vector is done automatically and without any check of the initial format. There is also no check of missing values, Na, NaN, -Inf, +Inf. Empirical probabilities of each point in the sorted dataset is calculated with the function ppoints. The parameter app corresponds to the parameter a in ppoints but has been limited to the range (0, 0.5). Default value is 0 as large datasets are very common in finance.

A nonlinear regression is performed with nlsLM from the logit of the probabilities logit(p) over the quantiles X with one of the functions lqkiener1234. These functions have been selected as they have an explicit form in the four types (this is unfortunately not the case for dkiener234) and return satisfactory results with ordinary least squares. The median is calculated before the regression and is injected as a mandatory value in the regression function.

Kiener distributions use the following parameters, some of them being redundant. See aw2k and pk2pk for the formulas and the conversion between parameters:

m (mu) is the median of the distribution.
g (gamma) is the scale parameter.
a (alpha) is the left tail parameter.
k (kappa) is the harmonic mean of a and w and describes a global tail parameter.
w (omega) is the right tail parameter.
d (delta) is the distortion parameter.
e (epsilon) is the eccentricity parameter.

Where:

c(m, g, k) of length 3 for distribution "K1".
c(m, g, a, w) of length 4 for distribution "K2".
c(m, g, k, d) of length 4 for distribution "K3".
c(m, g, k, e) of length 4 for distribution "K4".
c(m, g, a, k, w, d, e) of length 7 extracted from object of class clregk like regkienerLX (typically "reg$coefk").

Model "K1" return results with 1+2=3 parameters and describes a (assumed) symmetric distribution. Parameters d and e are set to 0. Models "K2", "K3" and "K4" describe asymmetric distributions. They return results with 1+3=4 parameters. Model "K2" has a very clear parameter definition but unfortunately parameters a and w are highly correlated. Model "K3" has the least correlated parameters but the meaning of the distortion parameter d, usually of order 1e-3, is not simple.

Model "K4" exhibits a reasonable correlation between each parameter and should be the preferred intermediate model between "K1" and "K2" models. The eccentricity parameter e is well defined and easy to understand: e=(a-w)/(a+w), a=k/(1-e) and w=k/(1+e). It varies between -1 and +1 and can be understood as a percentage (if times 100) of eccentricty. e = -1 corresponds to w = infinity, e = +1 corresponds to a = infinity and the model becomes a single log-logistic funtion with a right / left stopping point and a left / right tail.

Tail parameter lower and upper values are controlled by maxk and mink. An upper value maxk = 10 is appropriate for datasets of low and medium size, less than 50.000 points. For larger datasets, the upper limit can be extended up to maxk = 20. Such a limit returns results which are very closed to the logistic distribution, an alternate distribution which could be more appropriate. The lower limit mink is intended to avoid the value k=0. Remind that value k < 2 describes distribution with no stable variance and k < 1 describes distribution with no stable mean.

The output is an object in a flat format of class clregk. It can be listed with the function attributes.

First are the data.frames with the initial data and the estimated results.
Second is the result of the regression regk0 given by nlsLM from which a few information have been extracted and listed here.
Third are the regression parameters (without the median) in plain format (no rounding), the variance-covariance matrix, the variance-covariance matrix times 1e+6 and the correlation matrix in a rounded format. Note that regk0, coefk0, coefk0tt, vcovk0, mcork0 have a polymorphic format and changing parameters that depend from the selected model: "K1", "K2", "K3", "K4". They should be used with care in subsequent calculations.
Fourth are the distribution parameters tailored to every model "K1", "K2", "K3", "K4" plus estimated quantiles at levels: c(0.001, 0.005, 0.01, 0.05, 0.5, 0.95, 0.99, 0.995, 0.999). They are intended to subsequent calculations.
Fifth are the same parameters presented in a more readable format thanks to the vector pdgts which controls the rounding of the parameters in the following order:
pdgts = c("m","g","a","k","w","d","e","vcovk0","vcovk0m","mcork0","quantr").
Sixth are some probabilities and the corresponding estimated quantiles and estimated Expected Shortfall stored in a data.frame format.
Last is fitk which returns all parameters in the same format than fitkienerX, eventually subsetted by exfitk. IMPORTANT : if you need to subset fitk, always subset it by parameter names and never subset it by rank number as new items may be added in the future. Use for instance exfitk = exfit0, ..., exfit7.

Value

`dfrXP`	data.frame. X = initial quantiles. P = empirical probabilities.
`dfrXL`	data.frame. X = initial quantiles. L = logit of probabilities.
`dfrXR`	data.frame. X = initial quantiles. R = residuals after regression.
`dfrEP`	data.frame. E = estimated quantiles. P = probabilities.
`dfrEL`	data.frame. E = estimated quantiles. L = logit of probabilities.
`dfrED`	data.frame. E = estimated quantiles. D = estimated density (from probabilities).
`regk0`	object of class `nls` extracted from the regression function `nlsLM`.
`coefk0`	the regression parameters in plain format. The median is out of the regression.
`vcovk0`	rounded variance-covariance matrix.
`vcovk0m`	rounded 1e+6 times variance-covariance matrix.
`mcork0`	rounded correlation matrix.
`coefk`	all parameters in plain format.
`coefk1`	parameters for model "K1".
`coefk2`	parameters for model "K2".
`coefk3`	parameters for model "K3".
`coefk4`	parameters for model "K4".
`quantk`	quantiles of interest.
`coefr`	all parameters in a rounded format.
`coefr1`	rounded parameters for model "K1".
`coefr2`	rounded parameters for model "K2".
`coefr3`	rounded parameters for model "K3".
`coefr4`	rounded parameters for model "K4".
`quantr`	quantiles of interest in a rounded format.
`dfrQkPk`	data.frame. Qk = Estimated quantiles of interest. Pk = probabilities.
`dfrQkLk`	data.frame. Qk = Estimated quantiles of interest. Lk = Logit of probabilities.
`dfrESkPk`	data.frame. ESk = Estimated Expected Shortfall. Pk = probabilities.
`dfrESkLk`	data.frame. ESk = Estimated Expected Shortfall. Lk = Logit of probabilities.
`fitk`	Parameters, quantiles, moments, VaR, ES and other parameters (not rounded). Length of `fitk` depends on the choice applied to probak. IMPORTANT : if you need to subset `fitk`, always subset it by parameter names and never subset it by rank number as new items may be added in the future. Use for instance `exfit0`, ..., `exfit7`.

Examples

    

require(graphics)
require(minpack.lm)
require(timeSeries)

### Load the datasets and select one number (1-16)
DS     <- getDSdata()
j      <- 5


### and run this block
X      <- DS[[j]]
nameX  <- names(DS)[j]
reg    <- regkienerLX(X)

## Plotting
lleg   <- c("logit(0.999) = 6.9", "logit(0.99)   = 4.6", 
           "logit(0.95)   = 2.9", "logit(0.50)   = 0", 
           "logit(0.05)   = -2.9", "logit(0.01)   = -4.6", 
           "logit(0.001) = -6.9  ")
pleg   <- c( paste("m =",  reg$coefr4[1]), paste("g  =", reg$coefr4[2]), 
             paste("k  =", reg$coefr4[3]), paste("e  =", reg$coefr4[4]) )
op     <- par(mfrow=c(2,2), mgp=c(1.5,0.8,0), mar=c(3,3,2,1))
plot(X, type="l", main = nameX)
plot(reg$dfrXL, main = nameX, yaxt = "n")
axis(2, las=1, at=c(-9.2, -6.9, -4.6, -2.9, 0, 2.9, 4.6, 6.9, 9.2))
abline(h = c(-4.6, 4.6), lty = 4)
abline(v = c(reg$quantk[5], reg$quantk[9]), lty = 4)
legend("topleft", legend = lleg, cex = 0.7, inset = 0.02, bg = "#FFFFFF")
lines(reg$dfrEL, col = 2, lwd = 2)
points(reg$dfrQkLk, pch = 3, col = 2, lwd = 2, cex = 1.5)
plot(reg$dfrXP, main = nameX)
legend("topleft", legend = pleg, cex = 0.9, inset = 0.02 )
lines(reg$dfrEP, col = 2, lwd = 2)
plot(density(X), main = nameX)
lines(reg$dfrED, col = 2, lwd = 2)
round(cbind("k" = kmoments(reg$coefk, lengthx = nrow(reg$dfrXL)), "X" = xmoments(X)), 2)

## Attributes
attributes(reg)
head(reg$dfrXP)
head(reg$dfrXL)
head(reg$dfrXR)
head(reg$dfrEP)
head(reg$dfrEL)
head(reg$dfrED)
reg$regk0
reg$coefk0
reg$vcovk0
reg$vcovk0m
reg$mcork0
reg$coefk
reg$coefk1
reg$coefk2
reg$coefk3
reg$coefk4
reg$quantk
reg$coefr
reg$coefr1
reg$coefr2
reg$coefr3
reg$coefr4
reg$quantr
reg$dfrQkPk
reg$dfrQkLk
reg$dfrESkPk
reg$dfrESkLk
reg$fitk

## subset fitk
names(reg$fitk)
reg$fitk[exfit6]
reg$fitk[c(exfit1, exfit4)]
### End block

FatTailsR documentation built on June 8, 2025, 11:34 a.m.