In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

{ # this line vandalises this file: nonfinishers are ignored and should not be!
knitr::opts_chunk$set(echo = TRUE)
library("hyper2")

Consider a race between $n+2$ competitors of strength $p_1$, $p_2$, and $n$ clones, all of strength $p=1-p_1-p_2$. Our observation is that $p_1$ places $n_1$, and $p_2$ places $n_2>n_1$. The order statistic is thus

$$ \underbrace{p\succ\cdots\succ p}{\mbox{$n_1-1$ terms}}\succ p_1\succ\underbrace{p\succ\cdots\succ p}{n_2-n_1-1}\succ p_2\succ\underbrace{p\succ\cdots\succ p}_{n-n_2+2} $$

The Plackett-Luce likelihood function would be as follows

$$ \begin{eqnarray} &{}& \underbrace{ \frac{p}{p_1+p_2+np}\cdot \frac{p}{p_1+p_2+(n-1)p} \cdots \frac{p}{p_1+p_2+(n-n_1+2)p} }{n_1-1}\cdot\&{}& \frac{p_1}{p_1+p_2+(n-n_1+1)p}\cdot\&{}& \underbrace{ \frac{p}{p_2+(n-n_1+1)p}\cdot \frac{p}{p_2+(n-n_1)p}\cdots \frac{p}{p_2+(n-n_2+3)p} }{n_2-n_1-1}\cdot\&{}& \frac{p_2}{p_2+(n-n_2+2)p}\cdot\&{}& \underbrace{ \frac{p}{(n-n_2+2)p}\cdot\frac{p}{(n-n_2)p} \cdots\frac{p}{2p}\cdot\frac{p}{p} }_{n-n_2+2} \end{eqnarray} $$

named_drivers <- function(n,...){ # twodrivers(20, ham=3, bottas=5)  # NB repeats allowed!
   a <- list(...)
   stopifnot(all(table(as.vector(unlist(a)))==1)) # stops two players having the same result
   s <- rep("other",n)
   for(i in seq_along(a)){
     s[a[[i]]] <- names(a)[i]   # ham=3 -> s[3] <- "ham"
   }
   return(ordervec2supp3(s))  
}

(H <- named_drivers(20,ham=3,bott=5))

Thus H is a likelihood function on the assumption that the other drivers are all clones of identical strength. Maximizing this:

a <- read.table("formula1_2019.txt")[1:2,1:21]
a
f <- function(i){
  jj <- as.list(a[,i])
  suppressWarnings(jj <- as.numeric(jj))
  names(jj) <- rownames(a)
  jj[!is.na(jj)]   # NA entries ignored - THESE SHOULD BE NONFINISHERS
}

H <- hyper3()
for(i in seq_len(ncol(a))){
  jj <- f(i)
  if(all(!is.na(as.numeric(unlist(jj))))){
  }
}
H

maxp(H)

Thus we narrowly fail to reject the null at $\alpha = 0.05$. Now compare with the actual 2019 season:

F1_2019 <- ordertable2supp(read.table("formula1_2019.txt",header=TRUE)[,1:21])
samep.test(F1_2019,c("Hamilton","Bottas"))

We see a much smaller p-value when using the complete dataset. It is not clear whether we would expect a larger or a smaller p-value here, but it is not surprising to see a difference. On the simplified case, the estimated strength of other is about 0.016 (under the alternative); but in the full case, most of the other drivers have strength considerably more thn this. I do not have a good way of thinking about this at this time. One other feature is that the simplified likelihood results in different MLEs:

simplified: $\widehat{p_\mbox{Ham}} = 0.49$, $\widehat{p_\mbox{Bot}}=0.31$
full: $\widehat{p_\mbox{Ham}} = 0.38$, $\widehat{p_\mbox{Bot}}=0.13$

Consider the log-contrasts, $\log(p_1/p_2)$ giving

RobinHankin/hyper2 documentation built on April 13, 2025, 9:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com