# Timc: FORTRAN subroutine for computing VGGFR parameters In pvrank: Rank Correlations

## Description

Calls a fortran subroutine which performs a controlled random search to compute the parameters of the Vianelli Generalized Gaussian with finite range distribution (VGGFR).

## Usage

 1 Timc(n, mu2n, mu4n,icoef) 

## Arguments

 n  number of ranks mu2n variance of r_{h,n} mu4n kurtosis of r_{h,n} icoef an integer indicating the coefficient to be interpolated: 1=Spearman, 2=Gini, 3=Kendall, 4=r4, 5=Fisher-Yates, 6=Filliben.

## Details

The VGGFR is a flexible density to which we can resort in the case the number of ranks is larger than the threshold for which the exact null distribution of a rank correlation is known, but lower than the threshold for which the asymptotic Gaussian approximation becomes valid.

Although there are several methods for estimating the parameters of the GGFR, we follow the moment-matching method as applied by Karian and Dudewicz (2000) [Ch 2.6, p. 104] in similar estimations. The second and fourth centered moments of a GGFR distribution are:

where B(x,y) denotes the beta function between the positive real-valued numbers x and y. The second and fourth moments of the null distributions of the rank correlations are known polynomials in n.

μ_2(r_1)=\frac{1}{n-1}

μ_2(r_2)=\frac{2(2n+5)}{9n(n-1)}

μ_2(r_3)=≤ft[\frac{2}{3(n-1)}\right]≤ft[\frac{n^2+2+k_n}{n^2-k_n}\right]

μ_4(r_1)=\frac{3(25n^3-38n^2-35n+72)} {25n(n+1)(n-1)^3}

μ_4(r_2)=\frac{100n^4+328n^3-127n^2-997n-372} {1350≤ft[0.5n(n-1)\right]^3}

μ_4(r_3)=\frac{4[35n^7-(111-35k_n)n^6+(153+29k_n)n^5-(366-59k_n)n^4+304+11k_n)n^3]} {n^{k_n}(105-2k_n)(n+k_n)^3(\!n-k_n)^4(n-3+k_n)}+

\frac{-[(456-114k_n)n^2-(912-492k_n)n+(1248-933k_n)]} {n^{k_n}(105-2k_n)(n+k_n)^3(\!n-k_n)^4(n-3+k_n)}

The approximation to the null distribution of r_4 is based on an estimation of the second moment obtained through a regression strategy. In particular,

μ_2 (r_4)\approx \frac{1.00762}{(n-1)}

μ_4(r_4)\approx \frac{1.0949159471}{√{n-1}}+\frac{38.7820781157}{(n-1)^2}-\frac{208.8267798530}{(n-1)^3}+\frac{396.3338168921}{(n-1)^4}

See Tarsitano and Amerise (2016). The second and fourth moments of r_5 are given in Fieller and Pearson (1961):

μ_2(r_5)=\frac{1}{(n-1)}

μ_4(r_5)=\frac{1}{(n-1)^2}\Big[\frac{3(n-1)}{n+1}+\frac{(n-2)(n-3)}{n(n^2-1)}\Big(\frac{k_4}{k_2^2}\Big)^2 \Big]

with

k_2=\frac{∑_{i=1}^nξ(x_i|n)^2}{n-1}

k_4=\frac{n≤ft\{(n+1)∑_{i=1}^nξ(x_i|n)^4-\frac{3(n-1)}{n}\Big[∑_{i=1}^nξ(x_i|n)^2\Big]^2\right\}}{(n-1)(n-2)(n-3)}

where ξ(x_i|n) is the expected values of the i-th largest standardized deviate in a sample of size n from a Gaussian population. With regard to r_6, the GGFR approximation is provisionally implemented with the same structure as r_5 using the medians in place of the means of the Gaussian order statistics. See Amerise & Tarsitano (2016).

The estimate of the parameter vector λ=(λ_1,λ_2) is obtained by solving

C(λ)=\min[\max[g_2(λ),g_4(λ)]]

with g_2(λ)=μ_2(λ)-μ_{2,n} and g_4(λ)=μ_4(λ)-μ_{4,n} where μ_{2,n} and μ_{4,n} are the second and fourth moments of the given rank correlation.

## Value

 Lam A numeric vector giving the estimates of the two parameters of the VGGFR Eval Final value of the loss function

## Note

The GGFR density has two exponential parameters that make GGFR highly nonlinear. In addition, the presence of a beta function depending on the unknown parameters can create difficulties in numerical stability. To increase the chances of getting a global solution in reasonable computational times, TIMC executes a controlled random search (CRS) algorithm discussed, for example, in Conlon (1992) and Brachetti et al. (1997).

The CRS is a direct search technique and is purely heuristic. Given a function of two or more variables, an initial search domain is defined by specifying limits to each variable. The algorithm starts by initially storing a predetermined number of points uniformly sampled over the search space. In our version, the size of the sample of initial random points is 30. The points are obtained by using a Halton-sequence (Haltpon, 1964) to get a pseudo-random distribution of points within the search domain. The sample is then gradually contracted by replacing the current worst point in it with a better point. The stop criterion is defined in terms of the worst point C_M and the best point C_m of the array. The iterations terminate if C_M-C_m<10^{-16}.

The parameter space in which the search proceeds is 1.5≤ λ_1≤ 3.5, 0.3n≤λ_2≤ 0.6n (Spearman's r_1), 0.7n≤λ_2≤ 1.2n (Kendall's r_2) and 0.3n≤ λ_2≤ 0.9n (Gini's r_3), with kn=n\ mod\ 2. These limits were chosen via a trial-and-error process. The maximum number of function evaluations allowed is set at 100000√{Log10(n)}.

## Author(s)

Agostino Tarsitano

## References

Amerise, I. L. and Tarsitano, A. (2016). A re-edition of the probability plot correlation coefficient leading to a new rank correlation. Submitted.

Brachetti, P. et al. (1997). "A new version of the Price's algorithm for global optimization". Journal of Global Optimization, 10, 165-184.

Conlon, M. (1992). "The controlled random search procedure for function optimization". Communications in Statistics - Simulation and Computation, 21, 912–923.

Fieller, E. C. and Pearson, E. S. (1961). Tests for rank correlation coefficients: II. Biometrika, 48, 29–40.

Halton, j. H. (1964). Algorithm 247: Radical-inverse quasi- random point sequence. Communications of the ACM, 7, 701–702.

Karian, Z. A. and Dudewicz, E. J. (2000). Fitting statistical distributions. The generalized lambda distribution and generalized bootstrap methods. Boca Raton, FL: CRC Press.

Tarsitano, A. and Amerise, I. L. (2016). "Modelling of the null distribution of rank correlations". Submitted.

Vianelli, S. (1968). "Sulle curve normali di ordine r per intervalli finiti delle variabili statistiche". Annali della Facolta‘ di Economia e Commercio dell’Universita' di Palermo, 2.

Vianelli, S. (1983). "The family of normal and lognormal distributions of order r". Metron, 41, 3-10.

pvrank documentation built on May 17, 2018, 9:03 a.m.