In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

knitr::opts_chunk$set(echo = TRUE)
library("hyper2")
library("partitions")

knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

To cite the hyper2 package in publications, please use @hankin2017_rmd. Here we will sum probabilities across a sample space and verify that the total is constant. In theory the sum of the probabilities of all members of the sample space is 1, but in practice [and certainly in hyper2 idiom], the likelihood calculation includes an indeterminate constant which means that all we can say is that the total probability is not a function of the parameters of the distribution.

Here I present some relatively informal and unstructured thoughts on this issue from a numerical and hyper2 perspective.

M <- apply(partitions::multiset(c(1,1,2,2,3)),2,function(x){letters[x]})
colnames(M) <- rep("",ncol(M))   # tidier and more compact appearance
M <- noquote(M)
M

Above, we use multiset() to enumerate the sample space of five competitors comprising two twins denoted a, two twins denoted b, and a singleton c. This document discusses the issue of whether the twins are identical twins and hence indistinguishable. Matrix M has ${5\choose2\,2\,1}=30$ columns, one per possible result. We will calculate the likelihood of each column and sum them to get the probability of observing any one of the 30 order statistics [the observations are mutually exclusive, so $\operatorname{Prob}\left(x_1\cup\cdots\cup x_n\right) = \operatorname{Prob}\left(x_1\right)+\cdots +\operatorname{Prob}\left(x_n\right)$]:

tfun <- function(help,strength,M){
  out <- 0
  for(i in seq_len(ncol(M))){
     vec <- M[,i]
     out <- out +
       loglik(strength,cheering3(v=vec,e=c(a=1,b=2,c=3),help=help),log=FALSE)
  }
  return(out)
}

print(tfun(help=(1:3)/10, strength= c(a=0.1,b=0.3,c=0.6),M))
print(tfun(help=(7:5)/22, strength= c(a=0.3,b=0.2,c=0.5),M))

Above we see that the sum of the likelihoods is constant. The value turns out to be $\frac{1}{1!2!2!}=0.25$. This is because the likelihood function (being Plackett-Luce) considers the observation to be one of the $5!=120$ ways of ordering five distinct competitors. Matrix M has only $\frac{5!}{2!2!1!}=30$ columns, each one of which corresponds to four different observations, all of which have the same probability. Much of the confusion arises from the practice of identifying a competitor with his strength; we write "$p_a$'' [or sometimes just "$a$''] to signify both a particular competitor, and his Bradley-Terry strength.

To explain the factor of $1/4$ discussed above we need some new notation. We have five competitors who I will call $\mathbf{do}$, $\mathbf{re}$, $\mathbf{me}$, $\mathbf{fa}$, and $\mathbf{sol}$. Now $p_X$ means "the Bradley Terry strength of $X$''. The setup here is that $\mathbf{do}$ and $\mathbf{re}$ are twins, and that $\mathbf{me}$ and $\mathbf{fa}$ are twins (and $\mathbf{sol}$ is a singleton). We have that the Bradley-Terry strength of $\mathbf{do}$ and $\mathbf{re}$ is a, the strength of $\mathbf{me}$ and $\mathbf{fa}$ is b, and the strength of $\mathbf{sol}$ is c.

Consider, for example, the column of M reading a c b b a. In theory, this means the competitor who came first had strength a, the competitor who came second had strength c. So the order statistic is one of the $2!2!1!=4$ following orders:

$$\mathbf{do}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{re}$$ $$\mathbf{do}\succ\mathbf{sol}\succ\mathbf{fa}\succ\mathbf{me}\succ\mathbf{re}$$ $$\mathbf{re}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{do}$$ $$\mathbf{re}\succ\mathbf{sol}\succ\mathbf{fa}\succ\mathbf{me}\succ\mathbf{do}$$

Above we see that $\mathbf{do}$ and $\mathbf{re}$ come first and last (or last and first), and $\mathbf{me}$ and $\mathbf{fa}$ come third and fourth (or fourth and third), while $\mathbf{sol}$ comes second. All four possibilities have the same probability and all four are admissible order statistics for the five solfeggio syllables.

But the hyper2 package [specifically the loglik() and cheering3() functions] consider a c b b a to mean something different. It considers it to be an observation of

$$\mathbf{do}\succ\mathbf{sol}\succ\mathbf{me}\succ\mathbf{fa}\succ\mathbf{re}$$

followed by an identification of $\mathbf{do}$ with (BT strength) a, $\mathbf{sol}$ with strength c, and so on. Any of the four possibilities above would be equally admissible, and they all have the same probability. So to get "real'' probabilities [real in the sense that summing the probabilities over the columns in matrix M], we have to multiply by four to include all the admissible order statistics.

The difference between these two interpretations does not matter from a likelihood perspective as likelihood is defined only up to a multiplicative constant.

A more exacting numerical test

rs <- rp(100,dirichlet(c(a=1,b=1,c=1))) # random strengths
colnames(rs) <- letters[1:3]
head(rs)
f <- function(s){
  jj <- s
  names(jj) <- letters[1:3]
  tfun(help=1:3,strength=jj,M=M)
}
x <- apply(rs,1,f)