View source: R/o_estimation2.R
| fitkienerX | R Documentation | 
Several functions to estimate the parameters of asymmetric Kiener distributions 
and display the results in a numeric vector or in a matrix. 
Algorithm "reg" (the default) uses a nonlinear regression and handle 
difficult cases. Algorithm "estim" has been completely rewritten 
in version 1.8-0 and is now very accurate, even for k<1. Adjustement 
on extreme quantiles can be controlled very precisely.
fitkienerX(X, algo = c("r", "reg", "e", "estim"), ord = 7, maxk = 10,
  mink = 1.53, maxe = 0.5, probak = pprobs2, dgts = NULL,
  exfitk = NULL, dimnames = FALSE, ncores = 1)
paramkienerX(X, algo = c("r", "reg", "e", "estim"), ord = 7, maxk = 10,
  mink = 1.53, maxe = 0.5, dgts = 3, parnames = TRUE,
  dimnames = FALSE, ncores = 1)
paramkienerX7(X, dgts = 3, n = 10, maxk = 20, maxe = 0.9,
  parnames = TRUE, dimnames = FALSE, ncores = 1)
paramkienerX5(X, dgts = 3, i = 4, maxk = 20, maxe = 0.9,
  parnames = TRUE, dimnames = FALSE, ncores = 1)
X | 
 numeric. Vector, matrix, array or list of quantiles.  | 
algo | 
 character. The algorithm used:   | 
ord | 
 integer. Option for probability selection and treatment.  | 
maxk | 
 numeric. The maximum value of tail parameter   | 
mink | 
 numeric. The minimum value of tail parameter   | 
maxe | 
 numeric. The maximum value of absolute tail parameter   | 
probak | 
 numeric. Ordered vector of probabilities.  | 
dgts | 
 integer. The rounding of output parameters.  | 
exfitk | 
 character. A vector of parameter names to subset the output.  | 
dimnames | 
 boolean. Display dimnames.  | 
ncores | 
 integer. The number of cores for parallel processing of arrays.  | 
parnames | 
 boolean. Display parameter names.  | 
n | 
 integer. The 1:n and (N+i-n):N elements of   | 
i | 
 integer. The i-th and (N-i)-th elements of   | 
FatTailsR package currently uses two different algorithms to estimate the parameters of Kiener distributions K1, K2, K3 and K4.
Functions fitkienerX(algo = "reg"), paramkienerX(algo = "reg") 
and regkienerLX use an unweighted  
nonlinear regression from logit(p) to X over the whole dataset.  
Depending the size of the dataset, calculation can be slow but is usually
accurate and describes very well the last 1-10 points in the tails 
(except if there is a huge outlier). 
Functions fitkienerX(algo = "estim"), paramkienerX(algo = "estim"), 
paramkienerX5 and paramkienerX7 estimate the parameters with 
just 5 to 11 quantiles, 5 being the minimum. For averaging purpose, 
11 quantiles are proposed (see below). Computation is almost instantaneous 
and reasonnably accurate. This is the recommanded method for intensive computation.
A typical input is a numeric vector or a matrix that describes the returns of a stock. A matrix must be in the format DS with DATES as rownames, STOCKS as colnames and (log-)returns as the content of the matrix. An array must be in the format DSL with DATES as rownames, STOCKS as colnames LAGS in the third dimension and (log-)returns as the content of the array. A list can be a list of numeric but neither a list of matrix, a list of data.frame or a list of arrays.
Conversion from a (possible) time series format to a sorted numeric vector 
is done automatically and without any check of the initial format. 
Empirical probabilities of each point in the sorted dataset is calculated 
with the function ppoints whose parameter a has been set to 
a = 0 as large datasets are very common in finance. 
The lowest acceptable size of a dataset is not clear at this moment. A minimum 
of 11 points has been set in "reg" algorithm and a minimum of 15 points 
has been set in "estim" algorithm. It might change in the future. 
If possible, use at least 21 points. 
Parameter algo controls the algorithm used. Default is "reg".
When algo = "reg" (or algo = "r"), a nonlinear regression is performed 
with nlsLM from the logit of the empirical probabilities 
logit(p) over the quantiles X with the function qlkiener4. 
The maximum value of the tail parameter k is controlled by maxk.
An upper value maxk = 10 is appropriate for datasets
of low and medium size, less than 20.000 or 50.000 points. For larger datasets, the
upper limit can be extended up to maxk = 20. When this limit is reached, 
the shape of the distribution is very similar to the logistic distribution 
(at least when e = 0) and the use of this distribution should be considered. 
Remember that value k < 2 describes a distribution with no stable variance and 
k < 1 describes a distribution with no stable mean.
When algo = "estim" (or algo = "e"),
5 to 11 quantiles are used to estimate the parameters. 
The minimum is 5 quantiles : the median x.50, two quantiles at medium distance 
to the median, usually x.25 and x.75 and two quantiles located close to the extremes 
of the dataset, for instance x.01 and x.99 if the dataset X has more 
than 100 points, x.0001 and x.9999 if the dataset X has more than 
10.000 points and so on if the dataset is larger. 
These quantiles are extracted with function fiveprobs. 
Small datasets must contain at least 15 different points. 
With the idea of averaging the results (but without any guarantee of better 
estimates), calculation has been extended to 11 probabilities  
extracted from X with the function elevenprobs where    
p1, p2 and p3 are the most extreme probabilities of the dataset X  
with values finishing either by .x01 or .x025 or .x05:
p11 = c(p1, p2, p3, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p3, 1-p2, 1-p1)
Selection of subsets among these 11 probabilities is controlled with the option 
ord which can take 12 different values.  
For instance, the default ord = 7 computes the  parameters at probabilities 
c(p1, 0.25, 0.50, 0.75, 1-p1) and c(p2, 0.25, 0.50, 0.75, 1-p2).
Parameters d and k are averaged first and the results of these 
averages are used to compute the other parameters g, a, w, e. 
Small dataset should consider ord = 5 and 
large dataset can consider ord = 12. 
The 12 possible values of ord are: 
c(p1, 0.35, 0.50, 0.65, 1-p1)
c(p2, 0.35, 0.50, 0.65, 1-p2)
c(p1, p2, 0.35, 0.50, 0.65, 1-p2, 1-p1)
c(p1, p2, p3, 0.35, 0.50, 0.65, 1-p3, 1-p2, 1-p1)
c(p1, 0.25, 0.50, 0.75, 1-p1)
c(p2, 0.25, 0.50, 0.75, 1-p2)
c(p1, p2, 0.25, 0.50, 0.75, 1-p2, 1-p1)
c(p1, p2, p3, 0.25, 0.50, 0.75, 1-p3, 1-p2, 1-p1)
c(p1, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p1)
c(p2, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p2)
c(p1, p2, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p2, 1-p1)
c(p1, p2, p3, 0.25, 0.35, 0.50, 0.65, 0.75, 1-p3, 1-p2, 1-p1)
paramkienerX5 is a simplified version of paramkienerX with  
predefined values algo = "estim", ord = 5, maxk = 10 
and direct access to internal subfunctions. 
It uses the following probabilities:
p5 = c(p1, 0.25, 0.50, 0.75, 1-p1)
paramkienerX7 is a simplified version of paramkienerX with 
predefined values algo = "estim", ord = 7, maxk = 10 
and direct access to internal subfunctions.
It uses the following probabilities:
p7 = c(p1, p2, 0.25, 0.50, 0.75, 1-p2, 1-p1)
The quantiles corresponding to the above probabilities are then extracted 
with the function quantile whose parameter type 
has been set to type = 6 as it returns the closest values 
to the true quantiles (according to our experience) for all k > 1.9. 
(Note: when k < 1.5, algorithm algo = "reg" returns better  
results). 
Both probabilities and quantiles are then transfered to estimkiener11 
for calculation.
probak controls the probabilities at which the model is tested with the parameter 
estimates. fitkienerX and regkienerLX share the same subroutines.
The default for fitkienerX and regkienerLX is 
pprobs2 = c(0.01, 0.025, 0.05, 0.95, 0.975, 0.99) as those values 
are usual in finance. Other sets of values are provided at pprobs0.
Rounding the results is useful to display nice results, especially 
in a matrix or in a data.frame. dgts = 13 is recommanded 
as a, k, w are usually significant at 1 digit.
dgts = NULL does not perform any rounding. 
dgts = 0 to 9 rounds all parameters at the same level. 
dgts = 10 to 27 rounds the parameters at various levels for nice display.  
See roundcoefk for the details. (Note: the
rounding 10 to 27 currently works with paramkienerX, paramkienerX5,  
paramkienerX7 but not yet with fitkienerX). 
Extracting the most useful parameters from the (quite long) vector/matrix 
fitk is controlled by parameter exfitk that calls user-defined or
predefined parameter subsets like exfit0, ..., exfit7.
IMPORTANT: never subset fitk by rank number as new items may be added 
in the future and rank may vary.
Calculation of vectors, matrices and lists is not parallelized. Parallelization 
of code for arrays was introduced in version 1.5-0 and improved in version 1.5-1. 
ncores controls the number of cores allowed to the process (through 
parApply which runs on Unices and Windows and requires
about 2 seconds to start). ncores = 1 means no parallelization. 
ncores = 0 is the recommanded option. It uses the maximum number of cores 
available on the computer, as detected by detectCores,  
minus 1 core, which gives the best performance in most cases. 
Although appealing, this automatic selection may be sometimes dangerous. For instance, 
the instruction f(X, ncores_max) - f(X, ncores_max), a nice way to compute 
an array of 0, will call 2 ncores_max and crash R. ncores = 2,..,99 
sets manually the number of cores. If the requested value is larger than the maximum 
number of cores, this value is automatically reduced (with a warning) to this maximum.
Hence, this latest option provides one core more than option ncores = 0.
NOTE: fitkienerLX, regkienerX, estimkiener(X,5,7) were   
introduced in v1.2-0 and replaced in version v1.4-1 by fitkienerX and 
paramkiener(X,5,7) to accomodate vector, matrix, arrays and lists. 
We apologize to early users who need to rewrite their codes.
paramkienerX: a vector (or a matrix) of parameter estimates 
c(m, g, a, k, w, d, e).
fitkienerX: a vector (or a matrix) made of several parts:
ret : the return over the period calculated with sum(x). 
Thus, assume log-returns.  
m, g, a, k, w, d, e : the parameter estimates.  
m1, sd, sk, ke : the mean, standard deviation, 
skewness and excess of kurtosis computed from the parameter estimates.  
m1x, sdx, skx, kex : The mean, standard deviation,  
skewness and excess of kurtosis computed from the dataset.  
lh : the length of the dataset over the period.  
q. : quantile estimated with the parameter estimates. 
VaR. : Value-at-Risk, positive in most cases.  
c. : corrective tail coefficient = (q - m) / (q_logistic_function - m). 
ltm. : left tail mean (signed ES on the left tail, usually negative).  
rtm. : right tail mean (signed ES on the right tail, usually positive). 
dtmq. : (p<=0.5 left, p>0.5 right) tail mean minus quantile. 
ES. : expected shortfall, positive in most cases. 
h. : corrective ES  = (ES - m) / (ES_logistic_function - m). 
desv. : ES - VaR, usually positive.  
l. : quantile estimated by the tangent logistic function. 
dl. : quantile - quantile_logistic_function. 
g. : quantile estimated by the Laplace-Gauss function.  
dg. : quantile - quantile_Laplace_Gauss_function. 
IMPORTANT : if you need to subset fitk, always subset it by parameter names 
and never subset it by rank number as new items may be added in the future and rank may vary. 
Use for instance exfit0, ..., exfit7.
P. Kiener, Fat tail analysis and package FatTailsR, 9th R/Rmetrics Workshop and Summer School, Zurich, 27 June 2015. https://www.inmodelia.com/exemples/2015-0627-Rmetrics-Kiener-en.pdf
regkienerLX, estimkiener11, 
roundcoefk, exfit6.
    
require(minpack.lm)
require(timeSeries)
### Load the datasets and choose j in 1:16
DS     <- getDSdata()
j      <- 5
### and run this block
probak <- c(0.01, 0.05, 0.95, 0.99)
X      <- DS[[j]] ; names(DS)[j]
elevenprobs(X)
fitkienerX(X, algo = "reg", dgts = 3, probak = probak)
fitkienerX(X, algo = "estim", ord = 5, probak = probak, dgts = 3)
paramkienerX(X)
paramkienerX5(X)
### Compare the 12 values of paramkienerX(ord/row = 1:12) and paramkienerX (row 13)
compare <- function(ord, X) { paramkienerX(X, ord, algo = "estim", dgts = 13) }
rbind(t(sapply( 1:12, compare, X)), paramkienerX(X, algo = "reg", dgts = 13))
### Analyze DS in one step
t(sapply(DS, paramkienerX, algo = "reg", dgts = 13))
t(sapply(DS, paramkienerX, algo = "estim", dgts = 13))
paramkienerX(DS, algo = "reg", dgts = 13)
paramkienerX(DS, algo = "estim", dgts = 13)
system.time(fitk_rDS <- fitkienerX(DS, algo = "r", probak = pprobs2, dgts = 3))
system.time(fitk_eDS <- fitkienerX(DS, algo = "e", probak = pprobs2, dgts = 3))
fitk_rDS
fitk_eDS
### Subset rDS and eDS with exfit0,..,exfit7
fitk_rDS[,exfit4]
fitk_eDS[,exfit7]
fitkienerX(DS, algo = "e", probak = pprobs2, dgts = 3, exfitk = exfit7)
### Array (new example introduced in v1.5-1)
### Increase the number of cores and crash R.
## Not run:
arr <- array(rkiener1(3000), c(4,3,250))
paramkienerX7(arr, ncores = 2)
## paramkienerX7(arr, ncores = 2) - paramkienerX(arr, ncores = 2)
## End(Not run)
### End
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.