set.seed(0) knitr::opts_chunk$set(echo = TRUE) library("hyper2") library("magrittr") options("digits" = 5) options(width = 120)
(this document is largely the same as formula1.Rmd
but considers
only the top 9 drivers).
Formula 1 racing is an important and prestigious motor sport. Here I
analyse a series of racing results using the hyper
package which
implements the Plackett-Luce likelihood function.
Consider the following dataset, taken from Wikipedia:
f2017 <- read.table("formula1_2017.txt",header=TRUE) n <- 9 # top 9 f2017 <- f2017[seq_len(n),] f2017
Now process it to give the ranks:
f <- function(x){ noscore <- c("Ret", "WD", "DNS", "DSQ", "DNP", "NC") x <- as.character(x) x[x %in% noscore] <- "0" x <- as.numeric(x) wanted <- x != 0 xw <- x[wanted] xw[order(xw,decreasing=FALSE)] <- seq_along(xw) x[wanted] <- xw x } o <- apply(f2017[,seq_len(20)],2,f) rownames(o) <- rownames(f2017) (f2017 <- o)
Each row is a driver and each column (after the first) a venue. We see that Hamilton, the first row, came second in Australia, first in China, second in Bahrain, fourth in Russia, and so on. In the first column, we see the result from Australia (AUS) in which Hamilton came second, Vettel first, Bottas third, and so on. Notation used also includes six different classes of no-score such as "Ret" for retired, "WD" for withdrawn, and so on.
We can represent this data in a different form using
wikitable_to_ranktable()
:
wikitable_to_ranktable(f2017)
Looking at the first column ("c1" is read "came first") we see who
came first at each venue: Vettel in Australia (AUS), Hamilton in China
(CHN), Vettel in Bahrain (BHR), and so on. Now create a likelihood
function H
:
H <- hyper2(pnames=rownames(o)) for(i in seq_len(ncol(o))){ H <- H + ordervec2supp(o[,i]) } head(H)
Above, we see a small part of the Placket-Luce likelihood function for
the 2017 results, just the top 9 drivers. We can find the maximum
likelihood estimator by using the maxp()
function:
mL2017 <- maxp(H) mL2017 dotchart(mL2017,pch=16,main='2017 Formula 1') pie(mL2017,main='2017 Formula 1')
We see Hamilton being the strongest, followed by Vettel and Bottas who are approximately equal.
Here we write $H_0\colon p_1=p_2=\cdots =p_n=\frac{1}{n}$.
equalp.test(H)
highly signficant!
We can consider Hamilton and ask various questions about his playing strength. One natural null hypothesis would be that Hamilton's strength is equal to the mean strength, or $\frac{1}{9}$. (we attempt to reject the null $H_0$ against the alternative of unconstrained $p$). Noting that the evaluate for Hamilton's strength $\widehat{p_\mathrm{Hamilton}}$ is about 0.28, we ask how much support is lost, relative to the support at the unconstrained evaluate.
specificp.test(H,"Hamilton") # default value is 1/9=11%
(note that the constrained maximum is on the boundary of the admissible region with $p_\mathrm{Hamilton}=\frac{1}{9}$).
(details <- read.table("formula1_conditions.txt",header=TRUE)) drivers <- pnames(formula1)
At this point, we introduce a new player: Ham_wet
, for "Hamilton in
the wet", and we can test the hypothesis that the Hamilton ==
Ham_wet
. We will cycle through all the venues and, if it is wet
(according to dataframe conditions
) we will credit Hamilton's
placing to Ham_wet
.
details <- read.table("formula1_conditions.txt",header=TRUE) alldrivers <- c("Ham_dry","Ham_wet",rownames(f2017)[-1]) (L2017 <- hyper2(pnames=alldrivers)) # uninformative for(i in seq_len(ncol(f2017))){ venue <- colnames(f2017)[i] conditions <- details[which(details$venue == venue),4] o <- f2017[,i] names(o) <- rownames(f2017) if(conditions == "wet"){ names(o)[names(o) == "Hamilton"] <- "Ham_wet" } else if(conditions == "dry"){ names(o)[names(o) == "Hamilton"] <- "Ham_dry" } else { stop("neither wet nor dry?") } print(o) L2017 <- L2017 + ordervec2supp(o) # NB zero -> "Did not finish" } # for(i) loop closes L2017
We can now test the hypothesis that Hamilton's performance in the wet is the same as his performance in the dry:
samep.test(L2017,c("Ham_wet","Ham_dry"))
details <- read.table("formula1_conditions.txt",header=TRUE) H <- hyper2(pnames=c(rownames(f2017),"S")) noscore <- c("Ret", "WD", "DNS", "DSQ", "DNP", "NC") for(i in seq_len(ncol(f2017))){ y <- f2017[,i] cond <- details$conditions[i] jj <- ordervec2supp(y) if(cond == "dry"){ pnames(jj) <- pnames(H) # do nothing! } else if (cond == "wet"){ jj <- pwa(jj,"Hamilton") } else { stop("neither wet nor dry?") } # if wet closes H <- H + jj } # i loop closes head(H) maxp(H)
Now we can test the hypothesis that Hamilton's wet strength S
is zero:
pnames(H) specificp.gt.test(H,"S",0)
see this
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.