In RobinHankin/hyper2: The Hyperdirichlet Distribution, Mark 2

knitr::opts_chunk$set(echo = TRUE)
library("hyper2")
library("magrittr")
options("digits" = 5)
abbreviated <- TRUE  # change to FALSE for full names

knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

(takes about 30 minutes to run without cache, most of the time taken processing the maximi chunk)

To cite the hyper2 package in publications, please use @hankin2017_rmd. The file eurovision.txt, used below, is copied from "Eurovision Song Contest 2009," Wikipedia, accessed May 13, 2018. It refers to semi-final 1. More documentation is given in eurovision.Rd [type help(euro2009) at the R prompt].

First we specify the matrix as appearing in the Wikipedia page:

eurovision_table <- as.matrix(read.table("eurovision.txt"))
eurovision_table

Each row corresponds to a contestant and each column to a judge. Note that the matrix is not square: the last two columns correspond to Germany and the UK, who were judges but not competitors. I have removed the first column from the Wikipedia table (which gave the total of the points), replaced self-voting entries with NA, and replaced blanks with 0. Thus the first row corresponds to votes cast for ME, Montenegro. We see that BY (Belarus) awarded them three points, AM (Armenia) five points, and so on. The first column corresponds to the points awarded by Monetenegro. We see that they placed gave Belarus (BY) two points, Armenia (AM) four points, and so on. Their favourite was Bosnia and Herzegovina (BA) to whom they gave twelve points. The points system used was:

points <- c(12,10,8,7,6,5,4,3,2,1)

Variable points gives the number of points awarded, under Eurovision rules, to voters' first, second, third, etc choice. Points for any competitor are added and the winner is the competitor with the most points. However, in the hyper2 model, the numerical values themselves do not affect the likelihood function; only the order of the voters' preferences matters. The following R idiom translates wiki_matrix into a form suitable for analysis with hyper2:

preference <- eurovision_table*0  
for(i in seq_along(points)){ preference[eurovision_table == points[i]] <- i }

countries <- data.frame(
    fullname =
        c("Montenegro", "Czech rep", "Belgium", "Belarus", "Sweden",
          "Armenia", "Andorra", "Switzerland", "Turkey", "Israel",
          "Bulgaria", "Iceland", "Macedonia", "Romania", "Finland",
          "Portugal", "Malta", "Bosnia Herz", "Germany", "UK"),
    abbreviation = c("ME","CZ","BE", "BY", "SW", "AM", "AD", "CH",
                     "TR", "IL", "BG", "IS", "MK", "RO", "FI", "PT",
                     "MT", "BA", "DE", "UK")
)

if(abbreviated){ 
    jj <- countries$abbreviation
} else { 
    jj <- countries$fullname
}

competitors <- as.character(jj[1:18])
colnames(preference) <- jj      # voters; 20 countries (18 + DE + UK)
rownames(preference) <- competitors  

rownames(eurovision_table) <- jj[1:18]
colnames(eurovision_table) <- jj

In the above, matrix preference records voters' first, second, third, etc choice. A zero entry means no points (nul punkte!) and NA means that voter was forbidden from voting for that player (countries cannot vote for themselves). The competitors were the first 18 countries (the last two countries, Germany and the UK, voted but did not compete).

preference

Now, take the first column of preference. This represents the order of preferences of Montenegro (ME). Their favourite was [last row] Bosnia & Herzegovina (BA), who they gave rank 1 to (that is, 12 points). Their second favourite was Macedonia (MK), their third was Turkey (TR), and so on. They were not allowed to vote for themselves, which is why the first row is NA. So the order was: BA (first), MK (second) , TR (third), IS (fourth), RO (fifth), IL (sixth), AM (seventh), FI (eigthth), BY (ninth), MT (tenth). The other countries (CZ, BE, SW, CH, BG, PT) came joint last and did not receive any points from Montenegro. Note the final two rows corresponding to votes cast by Germany and the UK, who did not compete and therefore cast votes for all competitors.

We need to convert matrix preference into a likelihood function, here eurovision:

eurovision <- hyper2()

for(i in seq_len(ncol(preference))){   # cycle through the rows; each row is a voter
    d <- preference[,i,drop=TRUE]
    eurovision <- eurovision + suppfun(d[!is.na(d)])
} # i loop closes

eurovision_maxp <- maxp(eurovision)
eurovision_maxp

pie(eurovision_maxp)

Observe the small estimated strength for Czechoslovakia (CZ) at about 1e-06. This is consistent with the second row of preference which shows that noone gave them any points.

consistency(eurovision)

Now we can check null of equal strengths:

equalp.test(eurovision)

Thus the difference is about 122 units of support, surely significant.

points_scored <- rowSums(eurovision_table,na.rm=TRUE)  # points
ox <- rank(-points_scored)
oy <- rank(-eurovision_maxp)   # m = MLE
oyp <- ordertrans(oy,names(ox))

par(pty='s') # square plot
plot(ox,oyp,asp=1,pty='s',xlim=c(0,18),ylim=c(0,18),pch=16,
xlab="official order",ylab="my order",main="Eurovision 2009")

par(xpd=TRUE) # allow country names to appear outside plotting region
for(i in seq_along(ox)){text(ox[i],oyp[i],names(ox)[i],pos=4,col='gray') }
par(xpd=FALSE) # stop diagonal line from protruding beyond plotting region
abline(0,1)

Testing nonindependence using `hyper3` idiom

First we use hyper2:

a <- read.table("euro_ranks.txt",header=TRUE)
jj <- table(a$country)
wanted_countries <- names(jj[jj >= 10])
a <- a[a$country %in% wanted_countries,]
H <- hyper2()
for(y in unique(a$year)){
   H <- H + race((a$country)[a$year == y])
}
summary(H)

Find the evaluate:

mH <- maxp(H)

dotchart(mH)

Now use hyper3 and follow RB_BB_LF() in plackett_luce_monster.Rmd. The blocs follow those of Price 2020.

scandinavia <- c("Denmark","Finland","Iceland","Norway","Sweden")

balkans  <- c("Italy","Serbia","Albania","BosHerz") # red
eastern  <- c("Russia","Ukraine","Moldova","Armenia","Azerbaijan", "Malta","Israel") # purple
black    <- c("Hungary","Romania","Greece","Cyprus") # orange
iberia   <- c("Spain","Germany") # blue
northern <- c("Denmark", "Estonia", "Finland", "France", "Iceland", "Lithuania", "Netherlands", "Norway", "Sweden", "UK") # green

l <- list(balkans=balkans,eastern=eastern,black=black,iberia=iberia,northern=northern)

ab <- c(balkans,eastern,black,iberia,northern)
stopifnot(all(ab %in% wanted_countries))
stopifnot(all(wanted_countries %in% ab))


list2ec <- function(l){
  jj <- stack(setNames(l,seq_along(l)))
  out <- as.integer(c(jj$ind))
  names(out) <- jj$values
  return(out)
}

bloc <- function(lam){
  out <- hyper3()
  e <- list2ec(l)
  for(y in unique(a$year)){
     v <- a$country[a$year==y]
     ec <- e[names(e) %in% v]
     out <- out + cheering3(v=v, e=ec,h=rep(lam,length(unique(e))))
   } 
  return(out)
}

o <- function(lam){  # maximize function o()
   H <- bloc(lam)
   return(maxp(H,give=TRUE,n=1)$likes)
}

```r
lam <- seq(from=0.6,to=2,len=20)
L <- sapply(lam,o)

plot(lam,L-max(L),type="b",pch=16)
abline(h=c(0,-2))
abline(v=1)

Package dataset {-}

Following lines create eurovision.rda, residing in the data/ directory of the package.

save(eurovision_table,eurovision_maxp,eurovision,file="eurovision.rda")

References {-}

A. Price 2020. "Identifying voting blocs in the Eurovision Song Contest". Towards Data Science, https://medium.com/towards-data-science/identifying-voting-blocs-in-the-eurovision-song-contest-4792065fc337

RobinHankin/hyper2 documentation built on April 13, 2025, 9:33 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com