In anre005/pinbasic: Fast and Stable Estimation of the Probability of Informed Trading (PIN)

Lin and Ke Factorization

An even more stable and accurate formulation of the likelihood function is presented in the work of @LinKe. The factorization is applicable even for heavily traded stocks. The effectiveness and stability of the likelihood is caused by two principles [see @LinKe, p. 629]:

In computing $\exp(x)\exp(y)$ (or $x\exp(y)$), the expression of $\exp(x + y)$ (or $\sgn(x)\exp(\log(|x|) + y)$) is more stable than that of $\exp(x)\exp(y)$ (or $x\exp(y)$).
In the computer arithmetic process, the absolute computing error of a function $f(x)$ increases with the absolute value of its first-order derivative.

To fortify the usefulness of these two principles the authors give the following example. Say, one intends to compute $\log\left(\exp(x)\exp(y) + \exp(z)\right)$ with $x = 800$, $y = -400$ and $z = 900$. A threshold for the inputs for exponential function lies at 710, meaning that any larger value gets the exponential function to overflow and return infinite results.

At first, $\exp(x)\exp(y)$ would lead to an overflow error due to the fact that one input for the exponential is bigger than the threshold (800 > 710). Taking the first principle into account, we can compute this expression with $\exp(x + y)$ which gives r exp(800 - 400). However, the expression $\exp(z)$ would still produce an infinite value and therefore the expression $\log\left(\exp(x + y) + \exp(z)\right)$ is still not computable. The second principle states that one should avoid large input values for the exponential and small positive input values for the logarithmic function. Hence, $m + \log\left(\exp(x + y - m) + \exp(z - m)\right)$ with $m = \max(x + y, z) = 900$ is a more stable and accurate expression. Irrelevant of the specified values for $x,y$ and $z$, one of the terms $x + y - m$ and $z - m$ is always zero, whereas the remaining term is always less than 0. This yields small input values for the exponential functions, which in turn always sum up to a value greater than 1. Thus, we have no small positive input values for the natural logarithm. Using $x,y$ and $z$ as specified before we can compute the expression $\log\left(\exp(x)\exp(y) + \exp(z)\right)$, incorporating the R function log1p, as $$ \begin{align} & m + \log\left(\exp(x + y - \max(x + y, z)) + \exp(z - m)\right) = \notag \ & 900 + \log\left(\exp(800 - 400 - \max(400, 900)) + \exp(900 - 900)\right) = \notag \ & 900 + \log\left(\exp(-500) + 1 \right) \notag \ & \text{with} \log\left(\exp(-500) + 1 \right) = r log1p(exp(400 - 900)) \notag \end{align} $$ Following the two principles mentioned by @LinKe, a stable and efficient formulation of the likelihood is given by $$ \begin{align} \label{eq:linkefactr5par} \log \likelihood\left(\thetaehoshort \mid \datasymbs \right) = &\sum\limits_{\daysym=1}^{\totaldays} \Biggl( -\intensuninfbuys - \intensuninfsells + B_{\daysym} \log \left(\intensinf + \intensuninfbuys\right) + S_{\daysym} \log \left(\intensinf + \intensuninfsells\right) + e_{\max, \daysym}\Biggr) \notag \ & + \sum\limits_{\daysym=1}^{\totaldays} \log \Biggl( \left(1-\probinfevent\right) \exp\left(e_{1,\daysym} - e_{\max, \daysym} \right) + \probinfevent \left(1-\probbadnews\right) \exp\left(e_{2,\daysym} - e_{\max, \daysym} \right) \notag \ & + \probinfevent\probbadnews \exp\left(e_{3,\daysym} - e_{\max, \daysym} \right) \Biggr), \end{align} $$ where $e_{1,\daysym} = -B_{\daysym}\log\left(1+\dfrac{\intensinf}{\intensuninfbuys}\right)-S_{\daysym}\log\left(1+\dfrac{\intensinf}{\intensuninfsells}\right)$, $e_{2,\daysym} = -\intensinf - S_{\daysym}\log\left(1 + \dfrac{\intensinf}{\intensuninfsells} \right)$, $e_{3,\daysym} = -\intensinf - B_{\daysym}\log\left(1 + \dfrac{\intensinf}{\intensuninfbuys} \right)$ and $e_{\max, \daysym} = \max\left(e_{1,\daysym}, e_{2,\daysym}, e_{3,\daysym} \right)$. Again, the constant term $-\log(B!S!)$ is dropped.

With the Lin-Ke formulation we have a stable methodology to estimate $\pintext$ even for heavily traded stocks. Besides its the stability, the Lin-Ke factorization also speeds up the likelihood computation. Since the previously presented factorization by @EHO2010 returns infeasible function values for very frequently traded stocks and EKOP is nested in EHO model, we must strongly recommend the usage of the likelihood formulation by @LinKe in combination with the extended EHO setup for optimization routines.

anre005/pinbasic documentation built on May 6, 2022, 4:40 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

anre005/pinbasic
Fast and Stable Estimation of the Probability of Informed Trading (PIN)

In anre005/pinbasic: Fast and Stable Estimation of the Probability of Informed Trading (PIN)

Lin and Ke Factorization

R Package Documentation

Browse R Packages

We want your feedback!

anre005/pinbasic Fast and Stable Estimation of the Probability of Informed Trading (PIN)

In anre005/pinbasic: Fast and Stable Estimation of the Probability of Informed Trading (PIN)

Lin and Ke Factorization

R Package Documentation

Browse R Packages

We want your feedback!

anre005/pinbasic
Fast and Stable Estimation of the Probability of Informed Trading (PIN)