Refined HAC Algorithm

A third option for generating initial values for maximization of the likelihood function in the EHO model is presented by @Ersan. The authors claim that their method in combination with Lin-Ke likelihood factorization yields unbiased estimates for the probability of informed trading and the five model parameters.^[ The term unbiased refers to computing bias and not the statistical understanding.]

Similar to the HAC algorithm hierarchical agglomerative clustering is utilized for generating starting values. However, instead of stopping the clustering once three clusters are left, the number of groups is not predetermined per se. The refined HAC algorithm is stopped when $j + 1$ clusters are left, whereat $j$ can be any positive integer which is limited only by the available number of trading days ($j \leq \totaldays - 1$).^[ In the work by @Ersan $j$ is chosen to be an integer in the range from 1 to 10.]

In contrast to the HAC algorithm by @Cluster, the daily absolute order imbalance is used to assign trading days to one of the $j + 1$ groups. The clusters are then ordered by their average absolute order imbalance and distributed to a no-event and event cluster. To achieve multiple vectors of starting values instead of a single one, the two types of groups are build as follows, $$ \begin{align} CL_i^{\nonews} &= \bigcup\limits_{k = 1}^{i} CL_k, \notag \ CL_i^{\eventsymb} &= \bigcup\limits_{k = i + 1}^{j + 1} CL_k, \notag \end{align} $$ where $i = 1, \dots, j$ and $CL_i^{\nonews}$ represents the no-event cluster and $CL_i^{\eventsymb}$ the event cluster. At this point, we are able to obtain $\probinfevent^0$ and $\intensinf^0$. The initial guess for the probability of an information event is calculated, in a very similar way to the HAC algorithm by @Cluster, as the proportion the event cluster $CL_i^{\eventsymb}$ occupies from the total number of trading days $\totaldays$. The initial intensity of informed trading equals the difference in averages of the absolute order imbalances of no-event and event group.^[ In the work by @Ersan this relation is not directly mentioned. However, at this point in the algorithm , we solely have information about the two groups of trading days and the absolute order imbalance at hand. Since the EHO model does not separate informed buy rate from informed sell rate, the difference in averages of the absolute order imbalances of no-event and event group is appropriate to capture the intensity of trading due to private information.]

In the next step, $CL_i^{\eventsymb}$ is split according to the signs of the average order imbalances of its members in a good-news and bad-news group. The remaining starting values for the two intensities $\intensuninfbuys$ and $\intensuninfsells$ and the probability $\probbadnews$ are calculated according to HAC algorithm.

Hence, we get a total of $j$ vectors of initial values. Maximum likelihood estimation (MLE) is performed for each of the $j$ vectors and the best result among all maximization runs is kept while discarding the remaining. According to the results in the work of @Ersan a value of $j = 5$ is a good compromise between accuracy and speed.



anre005/pinbasic documentation built on May 6, 2022, 4:40 a.m.