{ # this line vandalises this file: nonfinishers are ignored and should not be! knitr::opts_chunk$set(echo = TRUE) library("hyper2")
Consider a race between $n+2$ competitors of strength $p_1$, $p_2$, and $n$ clones, all of strength $p=1-p_1-p_2$. Our observation is that $p_1$ places $n_1$, and $p_2$ places $n_2>n_1$. The order statistic is thus
$$ \underbrace{p\succ\cdots\succ p}{\mbox{$n_1-1$ terms}}\succ p_1\succ\underbrace{p\succ\cdots\succ p}{n_2-n_1-1}\succ p_2\succ\underbrace{p\succ\cdots\succ p}_{n-n_2+2} $$
The Plackett-Luce likelihood function would be as follows
$$ \begin{eqnarray} &{}& \underbrace{ \frac{p}{p_1+p_2+np}\cdot \frac{p}{p_1+p_2+(n-1)p} \cdots \frac{p}{p_1+p_2+(n-n_1+2)p} }{n_1-1}\cdot\&{}& \frac{p_1}{p_1+p_2+(n-n_1+1)p}\cdot\&{}& \underbrace{ \frac{p}{p_2+(n-n_1+1)p}\cdot \frac{p}{p_2+(n-n_1)p}\cdots \frac{p}{p_2+(n-n_2+3)p} }{n_2-n_1-1}\cdot\&{}& \frac{p_2}{p_2+(n-n_2+2)p}\cdot\&{}& \underbrace{ \frac{p}{(n-n_2+2)p}\cdot\frac{p}{(n-n_2)p} \cdots\frac{p}{2p}\cdot\frac{p}{p} }_{n-n_2+2} \end{eqnarray} $$
named_drivers <- function(n,...){ # twodrivers(20, ham=3, bottas=5) # NB repeats allowed! a <- list(...) stopifnot(all(table(as.vector(unlist(a)))==1)) # stops two players having the same result s <- rep("other",n) for(i in seq_along(a)){ s[a[[i]]] <- names(a)[i] # ham=3 -> s[3] <- "ham" } return(ordervec2supp3(s)) }
(H <- named_drivers(20,ham=3,bott=5))
Thus H
is a likelihood function on the assumption that the other
drivers are all clones of identical strength. Maximizing this:
a <- read.table("formula1_2019.txt")[1:2,1:21] a f <- function(i){ jj <- as.list(a[,i]) suppressWarnings(jj <- as.numeric(jj)) names(jj) <- rownames(a) jj[!is.na(jj)] # NA entries ignored - THESE SHOULD BE NONFINISHERS } H <- hyper3() for(i in seq_len(ncol(a))){ jj <- f(i) if(all(!is.na(as.numeric(unlist(jj))))){ } } H
maxp(H)
Thus we narrowly fail to reject the null at $\alpha = 0.05$. Now compare with the actual 2019 season:
F1_2019 <- ordertable2supp(read.table("formula1_2019.txt",header=TRUE)[,1:21]) samep.test(F1_2019,c("Hamilton","Bottas"))
We see a much smaller p-value when using the complete dataset. It is
not clear whether we would expect a larger or a smaller p-value here,
but it is not surprising to see a difference. On the simplified case,
the estimated strength of other
is about 0.016 (under the
alternative); but in the full case, most of the other drivers have
strength considerably more thn this. I do not have a good way of
thinking about this at this time. One other feature is that the
simplified likelihood results in different MLEs:
Consider the log-contrasts, $\log(p_1/p_2)$ giving
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.