set.seed(0) knitr::opts_chunk$set(echo = TRUE) library("hyper2") library("magrittr") options("digits" = 5)
knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))
To cite the hyper2
package in publications, please use @hankin2017_rmd.
This document analyses Bradley-Terry strengths of the competitors in
the Ladies' Figure Skating at the 2002 Winter Olympics. In the
package, dataframe skating_table
is copied from Lock and Lock (see
skating.Rd for more details). It also creates file data/skating.rda
which contains R objects skating
, skating_table
, and
skating_maxp
.
set.seed(0) skating_table <- read.table("skating.txt",header=TRUE) skating_table
Object skating_table
is an order table. It is structured so that
each competitor is a row, and each judge is a column. Function
rank_likelihood()
considers each row to be a race [or a judge], so
we need to take a transpose. The following R idiom is plausible but
incorrect:
skating_incorrect <- rank_likelihood(t(skating_table)) # incorrect
The above is incorrect because we are applying a rank technique to an order table. Correct analysis follows. One good thing to do is to present the dataset as an order table;
ordertable_to_ranktable(skating_table)
In the above, the column names should be read "came first" (c1
),
"came second" (c2
) and so on to "came 23rd" (c23
). From the first
column, we see that Hughes came first 5 times [according to judges
1,5,7,8,9] and Slutskya came first four times [judges 2,3,4,6]. From
the last column we see that all judges except J3
awarded Luca last
place. File man/rrank.Rd
has a detailed discussion of the
differences between rank and order.
We can convert an order table to a support function:
head(ordertable2supp(skating_table))
Further, we might ask how many judges ranked each competitor first, second and so on:
oo <- sapply(seq_len(23),function(i){rowSums(skating_table==i)}) colnames(oo) <- rep(" ",23) # nicer print oo
In the above, each row is a competitor and each column corresponds to a ranking. Thus the first column corresponds to "first place" and shows that five judges ranked Hughes first and four judges ranked Slutskaya first. The second column corresponds to "second place" and shows that one judge gave Hughes second place, two judges placed Slutskaya second, five gave Kawn second, and one gave Cohen second. The first row corresponds to Hughes and we see that, of $5+1+1+2=9$ judges, 5 placed Hughes first, one placed her second, one placed her third, and two placed her fourth.
The formal ordering used in competition is, according to Lock and Lock, given by the median ordinal:
apply(skating_table,1,median)
but with ties broken using a complicated hierarchy, the first of which is the "size of the majority":
rowSums(sweep(skating_table,1,seq_len(23))>=0)
which would suggest that Slutskaya beats Kwan (5-4), Butyrskaya draws
with Robinson (7-7), and so on. The rows of skating_table
are in
the order given by this system.
Now some data visualization. First the MLE for the strengths:
skating <- ordertable2supp(skating_table) options("use_alabama" = TRUE) skating_maxp <- maxp(skating,startp= c(5.0493913251e-06, 0.0053037393083, 0.090240695999, 0.00036893464900, 1.1397045995e-05, 8.3424027249e-05, 9.8129903007e-06, 0.00026960143514, 0.29831585124, 0.001376333090, 2.9434081237e-06, 0.26550514587, 0.00051946513092, 1.7e-06, 0.00080745068338, 0.00029358481485, 0.00048852120227, 0.0053075302458, 0.0025877438737, 0.30889220253, 1.2869403350e-05, 0.0184563624305147)) # values from previous repeated runs of maxp() m <- skating_maxp # for ease of typing m
equalp.test(skating)
dotchart(m) dotchart(log(m))
Looking at the dotcharts, it seems that the medallists---Hughes (gold), Slutskya (silver), Kwan (bronze)---were considerably higher in strength than the rest of the field. Here I will test the hypothesis that the medallists were in fact the strongest three competitors. Technically you need to optimize over the union of the possibilities that one of the three medallists did not come in the top three; but this is hard. We will do something much easier but numerically equivalent: optimize over the union of outcomes where either Cohen or Suguri (who placed fourth and fifth respectively) had higher strength than any of the medallists.
f <- function(smaller,bigger){ jj <- rep(0,22) names(jj) <- pnames(skating)[-23] # no fillup jj[smaller] <- -1 jj[bigger ] <- +1 return(jj) } problem_constraints <- NULL problem_constraints %<>% rbind(f("hughes" ,"cohen" )) problem_constraints %<>% rbind(f("slutskaya","cohen" )) problem_constraints %<>% rbind(f("kwan" ,"cohen" )) problem_constraints %<>% rbind(f("hughes" ,"suguri")) problem_constraints %<>% rbind(f("slutskaya","suguri")) problem_constraints %<>% rbind(f("kwan" ,"suguri")) rownames(problem_constraints) <- c("hug < coh", "slu < coh", "kwa < coh", "hug < sug", "slu < sug", "kwa < sug" ) colnames(problem_constraints) %<>% substr(1,3) problem_constraints small <- 0.01 big <- 0.1 start_vector <- rep(small,22) names(start_vector) <- pnames(skating)[-23] # no fillup start_vector["cohen"] <- big start_vector["suguri"] <- big
out <- rep(0,nrow(problem_constraints)) fullout <- list() mle_skaters <- NULL for(i in seq_len(nrow(problem_constraints))){ jj <- maxp(skating, startp=start_vector, give=TRUE,fcm=problem_constraints[i,], fcv=0,n=10,SMALL=1e-5) fullout[[i]] <- jj out[i] <- jj$value mle_skaters <- cbind(mle_skaters,jj$par) } out out-max(out) colnames(mle_skaters) <- rownames(problem_constraints) mle_skaters
Now compare these values with the unconstrained maximum likelihoods:
mgv <- maxp(skating, give=TRUE, n=10)$value mgv mgv - out mgv - max(out)
that is, a little over 7 units of support, clearly exceeding the two
units suggested by Edwards. We have strong evidence to suggest that
the medallists were indeed the strongest three competitors. Observe
that the maximum likelihood among the six alternative hypotheses is
that of number 3, in which the maximization was constrained to obey
Kwan < Cohen
. If we have to change the medallists, then
exchanging the order of Kwan and Cohen incurs the lowest likelihood
penalty. If, instead, we ask whether there is evidence that Suguri
[who was much weaker than Cohen] should not have been a medallist, we
find
mgv - max(out[4:6])
Much stronger.
plot(out,ylab="log likelihood",axes=FALSE) axis(2) axis(side=1,at=1:6,srt=45,labels=c( "Hug<Coh", "Slu<Coh", "Kwa<Coh", "Hug<Sug", "Slu<Sug", "Kwa<Sug" ))
In the plot above, the vertical axis shows the support. The six
points on the x-axis correspond to the six rows of
problem_constraints
; names have been abbreviated to the first three
letters. Thus the first three points are maximum likelihoods for
Hughes < Cohen
, Slutskya < Cohen
, and Kwan < Cohen
respectively.
We may test the hypothesis that Slutskaya == Kwan == Hughes:
samep.test(skating,c("slutskaya","kwan","hughes"))
maximum likelihood estimated strengths furnish an ordering for the competitors. We can compare ranking by strengths with the official point-tallying method:
rL <- sort(skating_maxp,decreasing=TRUE) rL[] <- seq_along(rL) rO <- seq_len(nrow(skating_table)) names(rO) <- rownames(skating_table) ordertransplot(rO,rL,xlab="offical rank",ylab="likelihood rank",main="Ladies free skating, 2002 Winter Olympics")
In figure \@ref(fig:rLrO), there is close agreement between the two methods in the large, but differences in detail. For example, the official ordering is hughes first, then slutskaya second, then kwan third; likelihood winner is slutskya, followed by hughes and then kwan. We can gain some understanding of this result by looking at the raw judging results for the three medallists:
head(skating_table,2)
and then calculating a table of the results:
hughes <- unlist(skating_table[1,]) slutskaya <- unlist(skating_table[2,]) table(hughes) table(slutskaya)
Looking at the above, we see that Hughes has five judges who gave her
"1"s, while Slutskaya has only 4, which is why she was considered to
be the top. It seems that Hughes's two "4"s (compared with
Slutskaya's one) have cost her more likelihood-based strength than her
extra "1" gave her; note also that the two judges who placed Hughes
fourth (viz J2
and J4
) placed Slutskaya first. Also:
table(hughes < slutskaya)
(note that equality is disallowed) showing that five judges preferred Hughes to Slutskaya and four preferred Slutsakaya to Hughes.
skating_table o <- ordertable_to_ranktable(skating_table) o
class(o) <- "matrix" o
Select just the top seven:
o <- o[,1:7] o
and now replace ranks with internal ranks (that is, rank among the top seven):
o <- t(apply(o,1,rank)) colnames(o) <- rownames(skating_table)[1:7] o
Now coece to a ranktable
:
class(o) <- "ranktable" o
Thence to an order table:
o <- ranktable_to_ordertable(o) o
Now create a likelihood function:
o <- ordertable2supp(o) o
and have some fun with it:
(o <- maxp(o)) pie(o)
Following lines create skating.rda
, residing in the data/
directory of the package.
save(skating_table,skating,skating_maxp,file="skating.rda")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.