In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

set.seed(0)
knitr::opts_chunk$set(echo = TRUE)
library("hyper2")
library("magrittr")
options("digits" = 5)

knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

(takes about twenty minutes to run without cache)

To cite the hyper2 package in publications, please use @hankin2017_rmd). This document analyses Bradley-Terry strengths of the competitors in the Ladies' Figure Skating at the 2002 Winter Olympics. In the package, dataframe skating_table is copied from Lock and Lock (see skating.Rd for more details). It also creates file data/skating.rda which contains R objects skating, skating_table, and skating_maxp.

set.seed(0)
skating_table <- ordertable(read.table("skating.txt", header=TRUE))
skating_table

Object skating_table is an order table. It is structured so that each competitor is a row, and each judge is a column. One good thing to do is to present the dataset as an order table;

as.ranktable(skating_table)

In the above, the column names should be read "came first" (c1), "came second" (c2) and so on to "came 23rd" (c23). From the first column, we see that Hughes came first 5 times [according to judges 1,5,7,8,9] and Slutskya came first four times [judges 2,3,4,6]. From the last column we see that all judges except J3 awarded Luca last place. File man/rrank.Rd has a detailed discussion of the differences between rank and order.

We can convert an order table to a support function:

summary(suppfun(skating_table))

Further, we might ask how many judges ranked each competitor first, second and so on:

oo <- sapply(seq_len(23),function(i){rowSums(skating_table==i)})
colnames(oo) <- rep(" ",23)  # nicer print
oo

In the above, each row is a competitor and each column corresponds to a ranking. Thus the first column corresponds to "first place" and shows that five judges ranked Hughes first and four judges ranked Slutskaya first. The second column corresponds to "second place" and shows that one judge gave Hughes second place, two judges placed Slutskaya second, five gave Kawn second, and one gave Cohen second. The first row corresponds to Hughes and we see that, of $5+1+1+2=9$ judges, 5 placed Hughes first, one placed her second, one placed her third, and two placed her fourth.

The formal ordering used in competition is, according to Lock and Lock, given by the median ordinal:

apply(skating_table,1,median)

but with ties broken using a complicated hierarchy, the first of which is the "size of the majority":

rowSums(sweep(skating_table,1,seq_len(23))>=0)

which would suggest that Slutskaya beats Kwan (5-4), Butyrskaya draws with Robinson (7-7), and so on. The rows of skating_table are in the order given by this system.

Data visualization

Now some data visualization. First the MLE for the strengths:

skating <- suppfun(skating_table)
options("use_alabama" = TRUE)
skating_maxp <- maxp(skating,startp=
c(5.0493913251e-06, 0.0053037393083, 0.090240695999, 
0.00036893464900, 1.1397045995e-05, 8.3424027249e-05, 
9.8129903007e-06, 0.00026960143514, 0.29831585124, 
0.001376333090, 2.9434081237e-06, 0.26550514587, 
0.00051946513092, 1.7e-06, 0.00080745068338, 
0.00029358481485, 0.00048852120227, 0.0053075302458, 
0.0025877438737, 0.30889220253, 1.2869403350e-05, 
0.0184563624305147))    # values from previous repeated runs of maxp()
m <- skating_maxp # for ease of  typing
m

equalp.test(skating)

dotchart(m)
dotchart(log(m))

Evidence for medal positions

Looking at the dotcharts, it seems that the medallists---Hughes (gold), Slutskya (silver), Kwan (bronze)---were considerably higher in strength than the rest of the field. Here I will test the hypothesis that the medallists were in fact the strongest three competitors. Technically you need to optimize over the union of the possibilities that one of the three medallists did not come in the top three; but this is hard. We will do something much easier but numerically equivalent: optimize over the union of outcomes where either Cohen or Suguri (who placed fourth and fifth respectively) had higher strength than any of the medallists.

f <- function(smaller,bigger){
  jj <- rep(0,22)
  names(jj) <- pnames(skating)[-23]  #  no fillup
  jj[smaller] <- -1
  jj[bigger ] <- +1
  return(jj)
}   

problem_constraints <-  NULL
problem_constraints %<>% rbind(f("hughes"   ,"cohen" ))
problem_constraints %<>% rbind(f("slutskaya","cohen" ))
problem_constraints %<>% rbind(f("kwan"     ,"cohen" ))
problem_constraints %<>% rbind(f("hughes"   ,"suguri"))
problem_constraints %<>% rbind(f("slutskaya","suguri"))
problem_constraints %<>% rbind(f("kwan"     ,"suguri"))

rownames(problem_constraints) <-
c("hug < coh", "slu < coh", "kwa < coh", "hug < sug", "slu < sug", "kwa < sug" )
colnames(problem_constraints) %<>% substr(1,3)
problem_constraints

small <- 0.01
big <- 0.1
start_vector <- rep(small,22)
names(start_vector) <- pnames(skating)[-23]  #  no fillup
start_vector["cohen"] <- big
start_vector["suguri"] <- big

out <- rep(0,nrow(problem_constraints))
fullout <- list()

mle_skaters <- NULL

for(i in seq_len(nrow(problem_constraints))){
  jj <- maxp(skating, startp=start_vector, give=TRUE,fcm=problem_constraints[i,], fcv=0,n=10,SMALL=1e-5)
   fullout[[i]] <- jj
   out[i] <- jj$value
   mle_skaters <- cbind(mle_skaters,jj$par)

}
out
out-max(out)
colnames(mle_skaters) <- rownames(problem_constraints)
mle_skaters

Now compare these values with the unconstrained maximum likelihoods:

options(use_alabama = TRUE)
mgv <- maxp(skating, give=TRUE, n=1)$value
mgv
mgv  - out
mgv  - max(out)

that is, a little over 7 units of support, clearly exceeding the two units suggested by Edwards. We have strong evidence to suggest that the medallists were indeed the strongest three competitors. Observe that the maximum likelihood among the six alternative hypotheses is that of number 3, in which the maximization was constrained to obey Kwan < Cohen. If we have to change the medallists, then exchanging the order of Kwan and Cohen incurs the lowest likelihood penalty. If, instead, we ask whether there is evidence that Suguri [who was much weaker than Cohen] should not have been a medallist, we find

mgv  - max(out[4:6])

Much stronger.

plot(out,ylab="log likelihood",axes=FALSE)
axis(2)
axis(side=1,at=1:6,srt=45,labels=c(
"Hug<Coh", "Slu<Coh", "Kwa<Coh",
"Hug<Sug", "Slu<Sug", "Kwa<Sug"
))

In the plot above, the vertical axis shows the support. The six points on the x-axis correspond to the six rows of problem_constraints; names have been abbreviated to the first three letters. Thus the first three points are maximum likelihoods for Hughes < Cohen, Slutskya < Cohen, and Kwan < Cohen respectively.

We may test the hypothesis that Slutskaya == Kwan == Hughes:

samep.test(skating,c("slutskaya","kwan","hughes"))

Ranking methodologies.

maximum likelihood estimated strengths furnish an ordering for the competitors. We can compare ranking by strengths with the official point-tallying method:

 rL <- sort(skating_maxp,decreasing=TRUE)
 rL[] <- seq_along(rL)
 rO <- seq_len(nrow(skating_table))
 names(rO) <- rownames(skating_table)
 ordertransplot(rO,rL,xlab="offical rank",ylab="likelihood rank",main="Ladies free skating, 2002 Winter Olympics")

In figure \@ref(fig:rLrO), there is close agreement between the two methods in the large, but differences in detail. For example, the official ordering is hughes first, then slutskaya second, then kwan third; likelihood winner is slutskya, followed by hughes and then kwan. We can gain some understanding of this result by looking at the raw judging results for the three medallists:

skating_table[1:3]

and then calculating a table of the results:

hughes    <- unclass(skating_table)[1,]
slutskaya <- unclass(skating_table)[2,]
table(hughes)
table(slutskaya)

Looking at the above, we see that Hughes has five judges who gave her "1"s, while Slutskaya has only 4, which is why she was considered to be the top. It seems that Hughes's two "4"s (compared with Slutskaya's one) have cost her more likelihood-based strength than her extra "1" gave her; note also that the two judges who placed Hughes fourth (viz J2 and J4) placed Slutskaya first. Also:

table(hughes < slutskaya)

(note that equality is disallowed) showing that five judges preferred Hughes to Slutskaya and four preferred Slutsakaya to Hughes.

Just the top seven competitors

Now we consider only the top seven competitors (that is, the top seven according to Judge 1).

(o <- skating_table[1:7])

(o <- as.ranktable(o))

Now create a likelihood function:

o <- suppfun(o)
o

and have some fun with it:

(o <- maxp(o))
pie(o)

Package dataset {-}

Following lines create skating.rda, residing in the data/ directory of the package.

save(skating_table, skating, skating_maxp, file="skating.rda")

References {-}

RobinHankin/hyper2 documentation built on April 13, 2025, 9:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RobinHankin/hyper2
The Hyperdirichlet Distribution, Mark 2

In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

Data visualization

Evidence for medal positions

Ranking methodologies.

Just the top seven competitors

Package dataset {-}

References {-}

R Package Documentation

Browse R Packages

We want your feedback!

RobinHankin/hyper2 The Hyperdirichlet Distribution, Mark 2

In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

Data visualization

Evidence for medal positions

Ranking methodologies.

Just the top seven competitors

Package dataset {-}

References {-}

R Package Documentation

Browse R Packages

We want your feedback!

RobinHankin/hyper2
The Hyperdirichlet Distribution, Mark 2