set.seed(0)
knitr::opts_chunk$set(echo = TRUE)
library("hyper2")
library("magrittr")
options("digits" = 5)
knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

To cite the hyper2 package in publications, please use @hankin2017_rmd. This short document analyses the rowing dataset: sculls in the 2016 olympics. It is not really standalone and is here as part of a consistent dataset generation mechanism used in the hyper2 package. See the hyper2 vignette for a more formal discussion. In Olympic sculling, up to six individual competitors race a small boat called a scull over a course of length $2\,\mathrm{km}$; the object is to cross the finishing line first. Note that actual timings are irrelevant, given the model, as the sufficient statistic is the order in which competitors cross the finishing line. The 2016 Summer Olympics is a case in point: the gold and silver medallists finished less than $\sim 5$ milliseconds apart, corresponding to a lead of $\sim 2.5\,\mathrm{cm}$. The package dataset is file rowing.txt, here are the first three lines:

readLines("rowing.txt")[1:3]

We can process this file using standard R functions.

rowing_table <- sapply(readLines("rowing.txt"),strsplit," ")
names(rowing_table) <- NULL  # names are pointless here
rowing_table[1:3]

The first line shows that fournier came first, cabrera second, and so on. We can convert this dataset into a Plackett-Luce likelihood function as follows:

rowing <- Reduce("+",sapply(rowing_table,race))
summary(rowing)

Object rowing is a loglikelihood function on the strengths of the 32 competitors. Our first step is to find the evaluate:

rowing_maxp <- maxp(rowing)
rowing_maxp

and we may display it using a dot chart:

dotchart(rowing_maxp)
dotchart(log10(rowing_maxp))
consistency(rowing)

Figure \@ref(fig:showrowingmaxppie) shows that many competitors have very small strength and figure \@ref(fig:showrowingmaxppielog) shows that many competitors have zero strength to numerical precision. Looking at the original dataset we can see why: observe that gambour came last in every race except for race number 7 (won by boudina). In race 7, he beat only yaklovlev. But we see that yaklovlev came last in every race except race 20, where he beat only gambour. Thus the maximum likelihood estimate for gambour + yaklovlev would be zero, because these two competitors always lose to everyone else. It is thus reasonable to eliminate gambour and yaklovlev from the dataset. We may apply this reasoning recursively and end up with the following dataset:

show("rowing_minimal.txt")

(also, we have removed drysdale, as he won everything and his ML strength would be 1). We may then analyse this:

rowing_minimal_table <- sapply(readLines("rowing_minimal.txt"),strsplit," ")
names(rowing_minimal_table) <- NULL  
rowing_minimal  <- Reduce("+",sapply(rowing_minimal_table,race))
summary(rowing_minimal)

This is a much more manageable dataset, much easier to study and less cluttered, as competitors with zero estimated strength are not present.

rowing_minimal_maxp <- maxp(rowing_minimal)
rowing_minimal_maxp
dotchart(rowing_minimal_maxp)
dotchart(log10(rowing_minimal_maxp))

Package dataset {-}

Following lines create rowing.rda, residing in the data/ directory of the package.

save(rowing,rowing_minimal,rowing_maxp,rowing_minimal_maxp,rowing_table,rowing_minimal_table,file="rowing.rda")

References {-}



RobinHankin/hyper2 documentation built on May 6, 2024, 4:25 p.m.