regkienerLX | R Documentation |
One function to estimate the parameters of Kiener distributions K1, K2, K3 and K4 and display the results in a list with many data.frame ready to use for plotting. This function performs an unweighted nonlinear regression of the logit of the empirical probabilities logit(p) on the quantiles X.
regkienerLX(X, model = "K4", pdgts = c(3, 3, 1, 1, 1, 3, 2, 4, 4, 2, 2),
maxk = 10, mink = 0.2, app = 0, probak = pprobs2, dgts = NULL,
exfitk = NULL)
X |
vector of quantiles. |
model |
the model used for the regression: "K1", "K2", "K3", "K4". |
pdgts |
vector of length 11. Control the rounding of output parameters. |
maxk |
numeric. The maximum value of tail parameter |
mink |
numeric. The minimum value of tail parameter |
app |
numeric. The parameter " |
probak |
vector of probabilities used in output regk$fitk.
For instance |
dgts |
rounding parameter applied globally to output regk$fitk. |
exfitk |
character. A vector of parameter names to subset regk$fitk.
For instance |
This function is designed to estimate the parameters of Kiener distributions
for a given dataset. It encapsulates the four distributions described in
this package.
"K1" uses model lqkiener1
, "K2" uses model lqkiener2
,
"K3" uses model lqkiener3
and "K4" uses model lqkiener4
.
A typical input is a numeric vector that describes the returns of a stock.
Conversion from a (possible) time series format to a sorted numeric vector
is done automatically and without any check of the initial format.
There is also no check of missing values, Na
, NaN
,
-Inf
, +Inf
.
Empirical probabilities of each point in the sorted dataset is calculated
with the function ppoints
. The parameter app
corresponds to the parameter a
in ppoints
but has been
limited to the range (0, 0.5). Default value is 0 as large datasets are
very common in finance.
A nonlinear regression is performed with nlsLM
from the logit of the probabilities logit(p)
over the quantiles X
with one of the functions lqkiener1234
.
These functions have been selected as they
have an explicit form in the four types (this is unfortunately not the case
for dkiener234
) and return satisfactory results with ordinary least
squares. The median is calculated before the regression and is injected
as a mandatory value in the regression function.
Kiener distributions use the following parameters, some of them being redundant.
See aw2k
and pk2pk
for the formulas and
the conversion between parameters:
m
(mu) is the median of the distribution.
g
(gamma) is the scale parameter.
a
(alpha) is the left tail parameter.
k
(kappa) is the harmonic mean of a
and w
and describes a global tail parameter.
w
(omega) is the right tail parameter.
d
(delta) is the distortion parameter.
e
(epsilon) is the eccentricity parameter.
Where:
c(m, g, k) of length 3 for distribution "K1".
c(m, g, a, w) of length 4 for distribution "K2".
c(m, g, k, d) of length 4 for distribution "K3".
c(m, g, k, e) of length 4 for distribution "K4".
c(m, g, a, k, w, d, e) of length 7 extracted from object of class
clregk
like regkienerLX
(typically "reg$coefk"
).
Model "K1"
return results with 1+2=3 parameters and describes a
(assumed) symmetric distribution. Parameters d
and e
are set
to 0. Models "K2"
, "K3"
and "K4"
describe asymmetric
distributions. They return results with 1+3=4 parameters.
Model "K2" has a very clear parameter definition but unfortunately
parameters a
and w
are highly correlated.
Model "K3"
has the least correlated parameters but the meaning of
the distortion parameter d
, usually of order 1e-3, is not simple.
Model "K4"
exhibits a reasonable correlation between each parameter
and should be the preferred intermediate model between "K1" and "K2" models.
The eccentricity parameter e
is well defined and easy to understand:
e=(a-w)/(a+w)
, a=k/(1-e)
and w=k/(1+e)
. It varies between
-1
and +1
and can be understood as a percentage (if times 100)
of eccentricty. e = -1
corresponds to w = infinity
,
e = +1
corresponds to a = infinity
and the model becomes a single
log-logistic funtion with a right / left stopping point and a left / right tail.
Tail parameter lower and upper values are controlled by maxk
and
mink
. An upper value maxk = 10
is appropriate for datasets
of low and medium size, less than 50.000 points. For larger datasets, the
upper limit can be extended up to maxk = 20
. Such a limit returns
results which are very closed to the logistic distribution, an alternate
distribution which could be more appropriate. The lower limit mink
is intended to avoid the value k=0
. Remind
that value k < 2
describes distribution with no stable variance and
k < 1
describes distribution with no stable mean.
The output is an object in a flat format of class clregk
. It can be
listed with the function attributes
.
First are the data.frames with the initial data and the estimated results.
Second is the result of the regression regk0
given by
nlsLM
from which a few information
have been extracted and listed here.
Third are the regression parameters (without the median) in plain format
(no rounding), the variance-covariance matrix, the variance-covariance
matrix times 1e+6 and the correlation matrix in a rounded format.
Note that regk0
, coefk0
, coefk0tt
, vcovk0
,
mcork0
have a polymorphic format and changing parameters that
depend from the selected model: "K1", "K2", "K3", "K4". They should be
used with care in subsequent calculations.
Fourth are the distribution parameters tailored to every model "K1", "K2", "K3", "K4" plus estimated quantiles at levels: c(0.001, 0.005, 0.01, 0.05, 0.5, 0.95, 0.99, 0.995, 0.999). They are intended to subsequent calculations.
Fifth are the same parameters presented in a more readable format thanks
to the vector pdgts
which controls the rounding of the parameters in
the following order:
pdgts = c("m","g","a","k","w","d","e","vcovk0","vcovk0m","mcork0","quantr")
.
Sixth are some probabilities and the corresponding estimated quantiles and estimated Expected Shortfall stored in a data.frame format.
Last is fitk
which returns all parameters in the same format
than fitkienerX
, eventually subsetted by exfitk
.
IMPORTANT : if you need to subset fitk
, always subset it by parameter names
and never subset it by rank number as new items may be added in the future.
Use for instance exfitk =
exfit0
, ..., exfit7
.
dfrXP |
data.frame. X = initial quantiles. P = empirical probabilities. |
dfrXL |
data.frame. X = initial quantiles. L = logit of probabilities. |
dfrXR |
data.frame. X = initial quantiles. R = residuals after regression. |
dfrEP |
data.frame. E = estimated quantiles. P = probabilities. |
dfrEL |
data.frame. E = estimated quantiles. L = logit of probabilities. |
dfrED |
data.frame. E = estimated quantiles. D = estimated density (from probabilities). |
regk0 |
object of class |
coefk0 |
the regression parameters in plain format. The median is out of the regression. |
vcovk0 |
rounded variance-covariance matrix. |
vcovk0m |
rounded 1e+6 times variance-covariance matrix. |
mcork0 |
rounded correlation matrix. |
coefk |
all parameters in plain format. |
coefk1 |
parameters for model "K1". |
coefk2 |
parameters for model "K2". |
coefk3 |
parameters for model "K3". |
coefk4 |
parameters for model "K4". |
quantk |
quantiles of interest. |
coefr |
all parameters in a rounded format. |
coefr1 |
rounded parameters for model "K1". |
coefr2 |
rounded parameters for model "K2". |
coefr3 |
rounded parameters for model "K3". |
coefr4 |
rounded parameters for model "K4". |
quantr |
quantiles of interest in a rounded format. |
dfrQkPk |
data.frame. Qk = Estimated quantiles of interest. Pk = probabilities. |
dfrQkLk |
data.frame. Qk = Estimated quantiles of interest. Lk = Logit of probabilities. |
dfrESkPk |
data.frame. ESk = Estimated Expected Shortfall. Pk = probabilities. |
dfrESkLk |
data.frame. ESk = Estimated Expected Shortfall. Lk = Logit of probabilities. |
fitk |
Parameters, quantiles, moments, VaR, ES and other parameters (not rounded).
Length of |
nlsLM
, laplacegaussnorm
,
Kiener distributions K1, K2, K3 and K4: kiener1
kiener2
, kiener3
, kiener4
.
Other estimation function: fitkienerX
and its derivatives.
fitk
subsetting: exfit0
.
require(graphics)
require(minpack.lm)
require(timeSeries)
### Load the datasets and select one number (1-16)
DS <- getDSdata()
j <- 5
### and run this block
X <- DS[[j]]
nameX <- names(DS)[j]
reg <- regkienerLX(X)
## Plotting
lleg <- c("logit(0.999) = 6.9", "logit(0.99) = 4.6",
"logit(0.95) = 2.9", "logit(0.50) = 0",
"logit(0.05) = -2.9", "logit(0.01) = -4.6",
"logit(0.001) = -6.9 ")
pleg <- c( paste("m =", reg$coefr4[1]), paste("g =", reg$coefr4[2]),
paste("k =", reg$coefr4[3]), paste("e =", reg$coefr4[4]) )
op <- par(mfrow=c(2,2), mgp=c(1.5,0.8,0), mar=c(3,3,2,1))
plot(X, type="l", main = nameX)
plot(reg$dfrXL, main = nameX, yaxt = "n")
axis(2, las=1, at=c(-9.2, -6.9, -4.6, -2.9, 0, 2.9, 4.6, 6.9, 9.2))
abline(h = c(-4.6, 4.6), lty = 4)
abline(v = c(reg$quantk[5], reg$quantk[9]), lty = 4)
legend("topleft", legend = lleg, cex = 0.7, inset = 0.02, bg = "#FFFFFF")
lines(reg$dfrEL, col = 2, lwd = 2)
points(reg$dfrQkLk, pch = 3, col = 2, lwd = 2, cex = 1.5)
plot(reg$dfrXP, main = nameX)
legend("topleft", legend = pleg, cex = 0.9, inset = 0.02 )
lines(reg$dfrEP, col = 2, lwd = 2)
plot(density(X), main = nameX)
lines(reg$dfrED, col = 2, lwd = 2)
round(cbind("k" = kmoments(reg$coefk, lengthx = nrow(reg$dfrXL)), "X" = xmoments(X)), 2)
## Attributes
attributes(reg)
head(reg$dfrXP)
head(reg$dfrXL)
head(reg$dfrXR)
head(reg$dfrEP)
head(reg$dfrEL)
head(reg$dfrED)
reg$regk0
reg$coefk0
reg$vcovk0
reg$vcovk0m
reg$mcork0
reg$coefk
reg$coefk1
reg$coefk2
reg$coefk3
reg$coefk4
reg$quantk
reg$coefr
reg$coefr1
reg$coefr2
reg$coefr3
reg$coefr4
reg$quantr
reg$dfrQkPk
reg$dfrQkLk
reg$dfrESkPk
reg$dfrESkLk
reg$fitk
## subset fitk
names(reg$fitk)
reg$fitk[exfit6]
reg$fitk[c(exfit1, exfit4)]
### End block
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.