In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

knitr::opts_chunk$set(echo = TRUE)

knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

To cite the hyper2 package in publications, please use @hankin2017_rmd. Here I consider the likelihood function given by a single observation of an order statistic, which is problematic in the sense that the maximum likelihood estimate for the BT strengths of the competitors is $(1,0,\ldots,0)$. Here we investigate some numerical problems arising from the use of optim() and constrOptim(). We will consider $n=10$ competitors and without loss of generality, the competitors finish in order $1,2,3,\ldots,10$. The likelihood function for this observation is:

[ {\mathcal L}\left(p_1,\ldots,p_{10}\right)= \frac{p_1}{p_1+\cdots+p_{10}}\cdot \frac{p_2}{p_2+\cdots+p_{10}}\cdot \frac{p_3}{p_3+\cdots+p_{10}}\cdot \frac{p_4}{p_4+\cdots+p_{10}}\cdot \frac{p_5}{p_5+\cdots+p_{10}}\cdot\ \frac{p_6}{p_6+\cdots+p_{10}}\cdot \frac{p_7}{p_7+\cdots+p_{10}}\cdot \frac{p_8}{p_8+p_9+p_{10}}\cdot \frac{p_9}{p_9+p_{10}}\cdot \frac{p_{10}}{p_{10}} ]

We can calculate the evaluate (that is, the maximum likelihood estimate) from the formula by inspection. It is clear from the penultimate term that $p_{10}=0$. Similarly from the preceding term it is easy to see that $p_9=0$ and so on. The evaluate is thus $p=(1,0,\ldots,0)$. Let's see if we can reproduce this using numerical methods. Note that this is a highly pathological case; the likelihood is not defined at the evaluate. Even considering limiting cases is not straightforward; consider two processes, approaching the evaluate in two different ways:

[ \lim_{x\longrightarrow 0} {\mathcal L}(1-9x,x,x,x,x,x,x,x,x,x,x)=\frac{1}{8!};\qquad \lim_{x\longrightarrow 0} {\mathcal L}(\alpha,\alpha x,\alpha x^2,\ldots,\alpha x^9)=1 ]

where $\alpha=(1-x^{10})/(1-x)$ is a normalizing constant. Thus the likelihood at the evaluate is not defined, even as a limit (because it depends on the direction of approach). However, it is reasonable to demand consistency in the limiting process. Thus, writing $p_i(x),1\leq i\leq 10$, we require

[ \lim_{x\longrightarrow 0}\frac{p_i(x)}{p_{i+1}(x)}=0,\qquad i=1,\ldots,9 ]

and if this is respected, then it is reasonable to say that the likelihood at the evaluate is 1, and the support 0.

Package idiom for likelihood functions

The hyper2 idiom for creating a likelihood function is as follows:

library("hyper2")
H <- race(letters[1:10])
H

And then maximizing it is carried out using the maxp() function:

evaluate1 <- maxp(H)
(maxlike1 <- loglik(indep(evaluate1),H))
dotchart(evaluate1,pch=16)

Comparing the numerically determined maximum likelihood estimate with the exact MLE of $(1,0,\ldots,0)$ shows that the numerics have not done a particularly good job in identifying the evaluate, presumably as a result of the difficulty of honouring the constraints. However, if we give the system a hand by starting at a point closer to the evaluate we can do slightly better:

f <- function(p,n){c(p,rep((1-p)/n,n-1))}
sss <- f(0.99,9)  # sensible start point, close (?) to the evaluate
evaluate2 <- maxp(H,startp=sss)
maxlike2 <- loglik(indep(evaluate2),H)

Graphically:

dotchart(evaluate2,pch=16)
par(pty='s')
plot(evaluate1,evaluate2,asp=1,log='xy')
abline(0,1)

So it is almost identical to the coldstart above in the sense that the two evaluates are very close. We can compare the hotstart to the coldstart in terms of support:

maxlike2-maxlike1

This is almost exactly zero, but negative: the so-called "sensible start point" has produced an inferior solution. This is because sss has $p_2=p_3=\ldots = p_{10}$, so predicts a draw between players $2,3,\ldots 10$, contrary to observation.

Possibly, a better way is to mimic the more appropriate of the two limiting processes above:

vsssp <- function(x,n){
  jj <- 1/x^seq_len(n)
  indep(jj/sum(jj))
}

Then

evaluate3 <- maxp(H,vsssp(2,10))
maxlike3 <- loglik(indep(evaluate3),H)
dotchart(evaluate3,pch=16)

Observe that the likelihood at this point is only very slightly better than the equalp() start point:

maxlike3-maxlike2

Discussion

I am not sure how important these considerations are in practice. Firstly, the order statistic used here is extremely pathological, and perhaps it is unreasonable to expect any inferential tool to work under these circumstances.

Secondly, and perhaps more importantly, likelihood is not defined at a point; likelihood is an equivalence class of functions whose domain is possible hypotheses. We can only compare likelihoods for two different hypotheses [because likelihood is only defined up to a multiplicative constant]. And for the observation considered here, the evaluate $(1,0,\ldots,0)$ is not a hypothesis anyone would want a likelihood for: the likelihood is not defined at that point, and does not even have a well-defined limit there.

To some extent this is not surprising: the numerics force one to have a strictly positive minimum value for each $p_i$ and the fillup value $p_{10}=1-\sum p_i$. The likelihood maximization routine tries to maximize $p_9/p_{10}, p_8/p_9,\ldots,p_1/p_2$ simultaneously, all subject to the unit sum constraint $\sum p_i=1$. And of course, if the ratio is too high then the unit sum constraint would be violated (given that $p_{10}$ is at its minimum value).