set.seed(0) knitr::opts_chunk$set(echo = TRUE) library("hyper2") library("magrittr") options("digits" = 5)
knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))
To cite the hyper2
package in publications, please use @hankin2017_rmd.
This short document analyses the rowing
dataset: sculls in the 2016
olympics. It is not really standalone and is here as part of a
consistent dataset generation mechanism used in the hyper2
package.
See the hyper2
vignette for a more formal discussion. In Olympic
sculling, up to six individual competitors race a small boat called a
scull over a course of length $2\,\mathrm{km}$; the object is to cross
the finishing line first. Note that actual timings are irrelevant,
given the model, as the sufficient statistic is the order in which
competitors cross the finishing line. The 2016 Summer Olympics is a
case in point: the gold and silver medallists finished less than $\sim
5$ milliseconds apart, corresponding to a lead of $\sim
2.5\,\mathrm{cm}$. The package dataset is file rowing.txt
, here are
the first three lines:
readLines("rowing.txt")[1:3]
We can process this file using standard R functions.
rowing_table <- sapply(readLines("rowing.txt"),strsplit," ") names(rowing_table) <- NULL # names are pointless here rowing_table[1:3]
The first line shows that fournier
came first, cabrera
second, and
so on. We can convert this dataset into a Plackett-Luce likelihood
function as follows:
rowing <- Reduce("+",sapply(rowing_table,race)) summary(rowing)
Object rowing
is a loglikelihood function on the strengths of the 32
competitors. Our first step is to find the evaluate:
rowing_maxp <- maxp(rowing) rowing_maxp
and we may display it using a dot chart:
dotchart(rowing_maxp)
dotchart(log10(rowing_maxp))
consistency(rowing)
Figure \@ref(fig:showrowingmaxppie) shows that many competitors have
very small strength and figure \@ref(fig:showrowingmaxppielog) shows
that many competitors have zero strength to numerical precision.
Looking at the original dataset we can see why: observe that gambour
came last in every race except for race number 7 (won by boudina
).
In race 7, he beat only yaklovlev
. But we see that yaklovlev
came
last in every race except race 20, where he beat only gambour
. Thus
the maximum likelihood estimate for gambour + yaklovlev
would be
zero, because these two competitors always lose to everyone else. It
is thus reasonable to eliminate gambour
and yaklovlev
from the
dataset. We may apply this reasoning recursively and end up with the
following dataset:
show("rowing_minimal.txt")
(also, we have removed drysdale
, as he won everything and his ML
strength would be 1). We may then analyse this:
rowing_minimal_table <- sapply(readLines("rowing_minimal.txt"),strsplit," ") names(rowing_minimal_table) <- NULL rowing_minimal <- Reduce("+",sapply(rowing_minimal_table,race)) summary(rowing_minimal)
This is a much more manageable dataset, much easier to study and less cluttered, as competitors with zero estimated strength are not present.
rowing_minimal_maxp <- maxp(rowing_minimal) rowing_minimal_maxp
dotchart(rowing_minimal_maxp) dotchart(log10(rowing_minimal_maxp))
Following lines create rowing.rda
, residing in the data/
directory
of the package.
save(rowing,rowing_minimal,rowing_maxp,rowing_minimal_maxp,rowing_table,rowing_minimal_table,file="rowing.rda")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.