skating

set.seed(0)
knitr::opts_chunk$set(echo = TRUE)
library("hyper2")
library("magrittr")
options("digits" = 5)

Ladies' figure skating in the 2002 Winter Olympics

This document analyses strengths of the competitors in the Ladies' Figure Skating at the 2002 Winter Olympics. In the package, dataframe skating_table is copied from Lock and Lock (see skating.Rd for more details). It also creates file data/skating.rda which contains R objects skating, skating_table, and skating_maxp.

set.seed(0)
skating_table <- read.table("skating.txt",header=TRUE)
skating_table

Object skating_table is an order table. It is structured so that each competitor is a row, and each judge is a column. Function rank_likelihood() considers each row to be a race [or a judge], so we need to take a transpose. The following R idiom is plausible but incorrect:

skating_incorrect <- rank_likelihood(t(skating_table)) # incorrect

The above is incorrect because we are applying a rank technique to an order table. Correct analysis follows. One good thing to do is to present the dataset as an order table;

ordertable_to_ranktable(skating_table)

In the above, the column names should be read "came first" (c1), "came second" (c2) and so on to "came 23rd" (c23). From the first column, we see that Hughes came first 5 times [according to judges 1,5,7,8,9] and Slutskya came first four times [judges 2,3,4,6]. From the last column we see that all judges except J3 awarded Luca last place. File man/rrank.Rd has a detailed discussion of the differences between rank and order.

We can convert an order table to a support function:

head(ordertable2supp(skating_table))

Further, we might ask how many judges ranked each competitor first, second and so on:

oo <- sapply(seq_len(23),function(i){rowSums(skating_table==i)})
colnames(oo) <- rep(" ",23)  # nicer print
oo

In the above, each row is a competitor and each column corresponds to a ranking. Thus the first column corresponds to "first place" and shows that five judges ranked Hughes first and four judges ranked Slutskaya first. The second column corresponds to "second place" and shows that one judge gave Hughes second place, two judges placed Slutskaya second, five gave Kawn second, and one gave Cohen second. The first row corresponds to Hughes and we see that, of $5+1+1+2=9$ judges, 5 placed Hughes first, one placed her second, one placed her third, and two placed her fourth.

The formal ordering used in competition is, according to Lock and Lock, given by the median ordinal:

apply(skating_table,1,median)

but with ties broken using a complicated hierarchy, the first of which is the "size of the majority":

rowSums(sweep(skating_table,1,seq_len(23))>=0)

which would suggest that Slutskaya beats Kwan (5-4), Butyrskaya draws with Robinson (7-7), and so on. The rows of skating_table are in the order given by this system.

Data visualization

Now some data visualization. First the MLE for the strengths:

skating <- ordertable2supp(skating_table)
skating_maxp <- maxp(skating)
m <- skating_maxp # for ease of  typing
dotchart(m)
dotchart(log(m))

Evidence for medal positions

Looking at the dotcharts, it seems that the medallists---Hughes (gold), Slutskya (silver), Kwan (bronze)---were considerably higher in strength than the rest of the field. Here I will test the hypothesis that the medallists were in fact the strongest three competitors. Technically you need to optimize over the union of the possibilities that one of the three medallists did not come in the top three; but this is hard. We will do something much easier but numerically equivalent: optimize over the union of outcomes where either Cohen or Suguri (who placed fourth and fifth respectively) had higher strength than any of the medallists.

jj <-
    matrix(c(
        -1, 0, 0,   1,0,  # Hughes   < Cohen
         0,-1, 0,   1,0,  # Slutskya < Cohen
         0, 0,-1,   1,0,  # Kwan     < Cohen

        -1, 0, 0,   0,1,  # Hughes   < Suguri
         0,-1, 0,   0,1,  # Slutskya < Suguri
         0, 0,-1,   0,1), # Kwan     < Suguri
        byrow=TRUE,ncol=5)

problem_constraints <-  # fill with zeros for other competitors
    cbind(jj,matrix(0,nrow(jj),size(skating)-ncol(jj)-1))

small <- 1e-4  # need a sensible start value satisfying the constraints
start <- c(rep(2*small,3),rep(3*small,2),rep(small,17))

out <- rep(0,nrow(problem_constraints))
fullout <- list()

for(i in seq_len(nrow(problem_constraints))){
  jj <- maxp(skating, startp=start, give=TRUE,fcm=problem_constraints[i,], fcv=0,n=1,SMALL=1e-5)
   fullout[[i]] <- jj
   out[i] <- jj$value
}
out

Now compare these values with the unconstrained maximum likelihoods:

mgv <- maxp(skating, give=TRUE, n=10)$value
mgv  - out
mgv  - max(out)

that is, a little over 1 unit of support, falling short of the two units suggested by Edwards. Observe that the maximum likelihood among the six alternative hypotheses is that of number 3, in which the maximization was constrained to obey Kwan < Cohen.

If, instead, we ask whether there is evidence that Suguri should not have been a medallist, we find

mgv  - max(out[4:6])

We can be more specific with the likelihoods:

plot(out,ylab="log likelihood",axes=FALSE)
axis(2)
axis(side=1,at=1:6,srt=45,labels=c(
"Hug<Coh", "Slu<Coh", "Kwa<Coh",
"Hug<Sug", "Slu<Sug", "Kwa<Sug"
))

In the plot above, the vertical axis shows the support. The six points on the x-axis correspond to the six rows of problem_constraints; names have been abbreviated to the first three letters. Thus the first three points are maximum likelihoods for Hughes < Cohen, Slutskya < Cohen, and Kwan < Cohen respectively.

Ranking methodologies.

maximum likelihood estimated strengths furnish an ordering for the competitors. We can compare ranking by strengths with the official point-tallying method:

 rL <- sort(skating_maxp,decreasing=TRUE)
 rL[] <- seq_along(rL)
 rO <- seq_len(nrow(skating_table))
 names(rO) <- rownames(skating_table)
 ordertransplot(rO,rL,xlab="offical rank",ylab="likelihood rank",main="Ladies free skating, 2002 Winter Olympics")

In figure \@ref(fig:rLrO), there is close agreement between the two methods in the large, but differences in detail. For example, the official ordering is hughes first, then slutskaya second, then kwan third; likelihood winner is slutskya, followed by hughes and then kwan. We can gain some understanding of this result by looking at the raw judging results for the three medallists:

head(skating_table,2)

and then calculating a table of the results:

hughes    <- unlist(skating_table[1,])
slutskaya <- unlist(skating_table[2,])
table(hughes)
table(slutskaya)

Looking at the above, we see that Hughes has five judges who gave her "1"s, while Slutskaya has only 4, which is why she was considered to be the top. It seems that Hughes's two "4"s (compared with Slutskaya's one) have cost her more likelihood-based strength than her extra "1" gave her; note also that the two judges who placed Highes fourth (viz J2 and J4) placed Slutskaya first. Also:

table(hughes < slutskaya)

(note that equality is disallowed) showing that five judges preferred Hughes to Slutskaya and four preferred Slutsakaya to Hughes.

Just the top seven competitors

skating_table
o <- ordertable_to_ranktable(skating_table)
o
class(o) <- "matrix"
o

Select just the top seven:

o <- o[,1:7]
o

and now replace ranks with internal ranks (that is, rank among the top seven):

o <- t(apply(o,1,rank))
colnames(o) <- rownames(skating_table)[1:7]
o

Now coece to a ranktable:

class(o) <- "ranktable"
o

Thence to an order table:

o <- ranktable_to_ordertable(o)
o

Now create a likelihood function:

o <- ordertable2supp(o)
o

and have some fun with it:

(o <- maxp(o))
pie(o)

Package dataset {-}

Following lines create skating.rda, residing in the data/ directory of the package.

save(skating_table,skating,skating_maxp,file="skating.rda")


Try the hyper2 package in your browser

Any scripts or data that you put into this service are public.

hyper2 documentation built on March 4, 2021, 9:09 a.m.