set.seed(0) knitr::opts_chunk$set(echo = TRUE) library("hyper2") library("magrittr") options("digits" = 5)
Surfing is a popular and growing activity in terms of both participation and viewing [@warshaw2010]. Competitive surfing involves surfers competing against one or two other surfers in heats lasting 30-50 minutes [@booth1995]. Points are awarded on a 10-point scale: each of five judges awards points to a surfer riding a particular wave. The World Surf League (WSL) is the main governing body of surfing competitions [@wsl]. The WSL conducts a world tour in which the best ranked 34 surfers compete; in addition, at each tour venue, two "wildcard" surfers also enter the competition, who are ignored here. The 2019 WSL tour saw competitive events held at eleven different surf locations.
The typical format is as follows. Up to four surfers---the competitors---are in the water simultaneously, watched by up to five judges. Subject to surfing etiquette, competitors are free to catch a wave at their discretion within the heat's time window, typically 20-30 minutes. The judges rate each wave ride and award points to the surfer based on a range of subjective criteria such as stylish execution and transition of manoeuvres, but also credit elements such as the difficulty and novelty of the performance. The top score and bottom score among the five judges are removed and the remaining three judges' scores are averaged to give the final score for the surfer for that wave. Each surfer's aggregate score is given by the average of the two highest-scoring waves. The winner is the surfer with the highest aggregate score.
Such scoring systems are designed to account for the random nature of wave quality while reflecting competitors' abilities fairly. Differences in wave quality between successive days mean that direct comparison of scores between one day and another are not informative about competitors' abilities as they are strongly dependent on details of wave quality at the time of the competition. However, for a particular heat, the order statistic---that is, which competitor scored most highly, second highest, and third---is informative about competitors' abilities: the wave environment is common to each surfer.
We note in passing that wct8
, hosted at Lemoore had a different
format from the other venues in the championship, being held in an
artificial wave pool in which every wave was essentially identical (up
to a mirror reflection). Statistically, there were seven heats of
either 2, 4, 8, or 16 surfers. We show below how to incorporate the
information present in such observations with the remainder of the
championship using a consistent and intuitive statistical model.
@farley2013 analyse surfing results using one-way ANOVA techniques and report comparisons between the top 10 and bottom 10 surfers in the 2013 World Championship Tour and conclude that the top-ranked athletes are more consistent than the lower-ranked group. They report that "To date, only a limited number of research studies have reported on the results of competitive surfing" [p44], and the analysis presented here is hoped to change that.
Other statistical research into competitive surfing appears to focus on the scoring of aerial manoeuvres compared with other manoeuvres in events. @lundgren2014, for example, report that aerial manoeuvres scored higher than other stunts but had a lower completion rate, a theme we return to below.
The Bradley-Terry model [@bradley1954] assigns non-negative strengths $p_1,\ldots, p_n$ to each of $n$ competitors in such a way that the probability of $i$ beating $j\neq i$ in pairwise competition is $\frac{p_i}{p_i+p_j}$; it is conventional to normalize so that $\sum p_i=1$. Further, we use a generalization due to [@luce1959], in which the probability of competitor~$i$ winning in a field of $\left\lbrace 1,\ldots, n\right\rbrace$ is $\frac{p_i}{p_1+\cdots +p_n}$. Noting that there is information in the whole of the finishing order, and not just the first across the line, we can follow [@plackett1975] and consider the runner-up to be the winner among the remaining competitors, and so on down the finishing order. Without loss of generality, if the order of finishing were $1,2,3,4,5$, then a suitable Plackett-Luce likelihood function would be
\begin{equation}\label{competitors_1_to_5_likelihood} \frac{p_1}{p_1+p_2+p_3+p_4+p_5}\cdot \frac{p_2}{p_2+p_3+p_4+p_5}\cdot \frac{p_3}{p_3+p_4+p_5}\cdot \frac{p_4}{p_4+p_5}\cdot \frac{p_5}{p_5} \end{equation}
and this would be a forward ranking Plackett-Luce model in the terminology of [@johnson2020]. We now use a technique due to Hankin (2010, 2017) and introduce fictional (reified) entities whose nonzero Bradley-Terry strength helps certain competitors or sets of competitors under certain conditions. The original example was the home-ground advantage in football. If players (teams) $1,2$ with strengths $p_1,p_2$ compete, and if our observation were $a$ home wins and $b$ away wins for team $1$, and $c$ home wins and $d$ away wins for team~$2$, then a suitable likelihood function would be
[ \left(\frac{p_1+p_H}{p_1+p_2+p_H}\right)^a \left(\frac{p_1}{p_1+p_2+p_H}\right)^b \left(\frac{p_2+p_H}{p_1+p_2+p_H}\right)^c \left(\frac{p_2+p_H}{p_1+p_2+p_H}\right)^d, ]
\noindent where $p_H$ is a quantification of the beneficial home ground effect. Similar techniques have been used to account for the first-move advantage in chess, and here we apply it to assess the Brazilian preference for reef-breaking waves.
surfing_table <- read.table("wct_table2019.txt",header=TRUE) nationality <- function(surfers){surfing_table$nation[rownames(surfing_table) %in% surfers]}
surfing_venuetypes <- read.table("venuetypes2019.txt",header=TRUE) surfing_venuetypes beachtype <- function(venue){surfing_venuetypes$beachtype[which(surfing_venuetypes$comp==venue)]} beachtype("wct05") beachtype("wct06") options(width=120) surfing_table wildcards <- rownames(surfing_table)[-(1:36)] wildcards QS <- rownames(surfing_table)[c(23:33,35:36)] QS
Some quick analysis of the wildcard surfers. First, how did the wildcards perform?
wilds <- as.matrix(surfing_table[-(1:36),2:12]) wilds[wilds == "INJ"] <- NA wilds[wilds == "-"] <- NA table(wilds)
Above we see that 10 wildcards placed "33" [meaning last place], 16 placed "17" [second to last] three placed "9" and two placed "5". [the placings are treated as character strings which is why they are not in numerical orer]. Next, how many venues did each wildcard participate at?
table(apply(wilds,1,function(x){sum(!is.na(x))}))
We see that 20 wildcards appeared at 1 venue, 4 at two venues, and one (Willcox) at three venues.q
o <- strsplit(readLines("wct2019.txt")," ") jj <- unlist(lapply(o,paste,collapse=" ")) head(jj) tail(jj)
The first line shows that, at one of the heats in wct01
, Colapinto
came first, Bailey
second, and Wright
came third. The last line
shows that, at wct11
, Ferreira
came first and Medina
came
second.
Following idiom creates a table:
table(unlist(lapply(strsplit(jj, " "),length)))
This shows that object jj
has 312 lines of 2 surfers, 160 lines with
3 surfers, 2 lines with 4 surfers, 2 lines with 8 surfers, and one
line with 16 surfers.
We can convert the dataset into a Plackett-Luce likelihood
function but first have to remove the wildcards and also player
Vries
whose maximum likelihood strength is zero.
remove <- c(wildcards, QS, "Vries") wanted <- which(!(rownames(surfing_table) %in% remove)) # used at directlogplotofstrengths surfing <- hyper2() surfing_pairs <- hyper2() for(v in o){ r <- v[-1] # first entry is venue numberretained <- sum(!(r %in% remove)) if(numberretained > 1){ r <- r[!(r %in% remove)] surfing <- surfing + race(r) if(numberretained == 2){ r <- r[!(r %in% remove)] surfing_pairs <- surfing_pairs + race(r) } } } summary(surfing) summary(surfing_pairs)
summary(surfing_pairs)
Object surfing
is a loglikelihood function on the strengths of the
competitors. Our first step is to find the evaluate:
surfing_maxp <- maxp(surfing) surfing_maxp sort(surfing_maxp)
Now compare surfing
with surfing_pairs
. First calculate
surfing_pairs_maxp <- maxp(surfing_pairs)
then plot
surfing_pairs_maxp surfing_maxp par(pty='s') plot(surfing_pairs_maxp,surfing_maxp,asp=1,pch=16) abline(0,1) plot(surfing_pairs_maxp,surfing_maxp,asp=1,pch=16,log='xy',xlim=c(0.01,0.2),ylim=c(0.01,0.2)) abline(0,1) plot(log(surfing_pairs_maxp/surfing_maxp),pch=16) abline(0,0)
and the second step is to establish its consistency:
consistency(surfing) consistency(surfing_pairs)
e_all <- eigen(hessian(surfing,indep(surfing_maxp)),T,T)$values e_pairs <- eigen(hessian(surfing_pairs,indep(surfing_pairs_maxp)),T,T)$values
e_all e_pairs f <- function(x){prod(sort(x)[-1])} f(e_all) f(e_pairs) f(e_all)/f(e_pairs)
and we may display it using a dot chart:
`mydotchart` <- function (x, labels = NULL, groups = NULL, gdata = NULL, offset = 1/8, ann = par("ann"), xaxt = par("xaxt"), frame.plot = TRUE, log = "", cex = par("cex"), pt.cex = cex, pch = 21, gpch = 21, bg = par("bg"), color = par("fg"), gcolor = par("fg"), lcolor = "gray", xlim = range(x[is.finite(x)]), main = NULL, xlab = NULL, ylab = NULL, ...) { opar <- par("mai", "mar", "mgp", "cex", "yaxs") on.exit(par(opar)) par(cex = cex, yaxs = "i") if (!is.numeric(x)) stop("'x' must be a numeric vector or matrix") n <- length(x) if (is.matrix(x)) { if (is.null(labels)) labels <- rownames(x) if (is.null(labels)) labels <- as.character(seq_len(nrow(x))) labels <- rep_len(labels, n) if (is.null(groups)) groups <- col(x, as.factor = TRUE) glabels <- levels(groups) } else { if (is.null(labels)) labels <- names(x) glabels <- if (!is.null(groups)) levels(groups) if (!is.vector(x)) { warning("'x' is neither a vector nor a matrix: using as.numeric(x)") x <- as.numeric(x) } } linch <- if (!is.null(labels)) max(strwidth(labels, "inch"), na.rm = TRUE) else 0 if (is.null(glabels)) { ginch <- 0 goffset <- 0 } else { ginch <- max(strwidth(glabels, "inch"), na.rm = TRUE) goffset <- offset } nmai <- opar[["mai"]] if (ann) nm.2 <- nmai[2L] if (!(is.null(labels) && is.null(glabels))) { yi <- if (is.null(ylab) || !ann) 0 else offset nm.2 <- nmai[4L] + max(yi + linch + goffset, ginch) + 1/16 if (nmai[2L] < nm.2) { nmai[2L] <- nm.2 par(mai = nmai) } } if (is.null(groups)) { o <- seq_len(n) y <- o ylim <- c(0, n + 1) } else { o <- sort.list(as.numeric(groups), decreasing = TRUE) x <- x[o] groups <- groups[o] color <- rep_len(color, length(groups))[o] lcolor <- rep_len(lcolor, length(groups))[o] pch <- rep_len(pch, length(groups))[o] of.1 <- cumsum(c(0, diff(as.numeric(groups)) != 0)) y <- seq_len(n) + 2 * of.1 ylim <- range(0, y + 2) } # plot.window(xlim = xlim, ylim = ylim, log = log) lheight <- par("csi") if (!is.null(labels)) { loffset <- (linch + 0.1)/lheight mtext(labels[o], side = 2, line = loffset, at = y, adj = 0, col = color, las = 2, cex = cex, ...) } abline(h = y, lty = "dotted", col = lcolor) points(x, y, pch = pch, col = color, bg = bg, cex = pt.cex/cex) if (!is.null(groups)) { gpos <- rev(cumsum(rev(tapply(groups, groups, length)) + 2) - 1) ginch <- max(strwidth(glabels, "inch"), na.rm = TRUE) goffset <- (max(linch + offset, ginch, na.rm = TRUE) + 1/16)/lheight mtext(glabels, side = 2, line = goffset, at = gpos, adj = 0, col = gcolor, las = 2, cex = cex, ...) if (!is.null(gdata)) { abline(h = gpos, lty = "dotted") points(gdata, gpos, pch = gpch, col = gcolor, bg = bg, cex = pt.cex/cex, ...) } } axis(1, xaxt = xaxt) if (frame.plot) box() if (ann) { title(main = main, xlab = xlab, ...) mgp <- par("mgp") par(mgp = c(max(mgp[1], nm.2/lheight - 1.5), mgp[-1])) title(ylab = ylab, ...) } invisible() }
dotchart(surfing_maxp) dotchart(log10(surfing_maxp)) pie(surfing_maxp) pdf(file="showsurfingmaxppie.pdf") pie(surfing_maxp) dev.off() jj <- log(surfing_maxp) kk <- log(surfing_pairs_maxp) names(jj) <- NULL names(kk) <- NULL l <- c(-4.6, -1.5) dotchart(log(surfing_maxp),pch=NA,xlim=l) mydotchart(jj,col="red" ,pch=16,xlim=l,xaxt='n',frame.plot=FALSE) mydotchart(kk,col="blue",pch=16,xlim=l,xaxt='n',frame.plot=FALSE) pdf(file="showsurfingmaxpdotchart.pdf") jj <- log10(surfing_maxp) dotchart(jj,pch=16,xlab="Log of Bradley-Terry strength of competitors (base 10)") dev.off()
Test the null of equal strengths:
equalp.test(surfing)
jjO <- surfing_table$points[wanted] names(jjO) <- rownames(surfing_table)[wanted] jjL <- surfing_maxp jjO <- ordertrans(jjO) jjL <- ordertrans(jjL) plot(jjO,jjL,pch=16,log='xy',xlab='points',ylab='maximum likelihood strength') fit <- lm(log10(jjL)~log10(jjO)) summary(fit) for(i in seq_along(jjO)){ text(jjO[i],jjL[i],names(jjL)[i],col='gray',pos=4,cex=0.5) } abline(fit) pdf(file="directlogplotofstrengths.pdf") par(xpd=NA) plot(jjO,jjL,pch=16,log='xy',xlab='points',ylab='maximum likelihood strength',cex=1.5) fit <- lm(log10(jjL)~log10(jjO)) for(i in seq_along(jjO)){ text(jjO[i],jjL[i],names(jjL)[i],col='gray',pos=4,cex=0.8) } par(xpd=FALSE) abline(fit) dev.off()
pdf(file="directplotofstrengths.pdf") par(xpd=NA) plot(jjO,jjL,pch=16,log='y',xlab='points',ylab='maximum likelihood strength',cex=1.5) fit <- lm(log10(jjL)~(jjO)) summary(fit) for(i in seq_along(jjO)){ text(jjO[i],jjL[i],names(jjL)[i],col='gray',pos=4,cex=0.8) } par(xpd=FALSE) abline(fit) dev.off()
Now compare the finishing order based on points vs the finishing order on likelhood:
rL <- sort(surfing_maxp,decreasing=TRUE) rL[] <- seq_along(rL) # likelihood-based rank rO <- seq_len(nrow(surfing_table[wanted,])) names(rO) <- rownames(surfing_table)[wanted] rO rL sort(names(rO)) sort(names(rO)) sort(names(rO))==sort(names(rO)) ordertransplot(rO,rL,xlab="offical rank",ylab="likelihood rank",main="") pdf(file="ordertransplotplot.pdf") ordertransplot(rO,rL,xlab="offical rank",ylab="likelihood rank",main="") dev.off()
samep.test(surfing, c("Ferreira","Medina")) samep.test(surfing, c("Ferreira","Florence")) samep.test(surfing, c("Florence","Medina")) samep.test(surfing, c("Florence","Medina","Ferreira")) samep.test(surfing, c("Medina","Wilson"))
We will test the null that Medina's performance at the Surf Ranch wa the same as his performance in the remainder of the tourament.
surfing_wct08 <- hyper2() surfing_elsewhere <- hyper2() for(v in o){ venue <- v[1] r <- v[-1] # first entry is venue r <- r[!(r %in% remove)] if(length(r)>1){ if(venue == "wct08"){ rmod <- r rmod[rmod == "Medina"] <- "Medina_wct08" surfing_wct08 <- surfing_wct08 + race(rmod) } else { surfing_elsewhere <- surfing_elsewhere + race(r) } } } # v loop closes both_together <- surfing_wct08 + surfing_elsewhere
Now test the null that Medina's performance is identical at wct08 and elsewhere:
samep.test(both_together,c("Medina","Medina_wct08"))
One suggestion is that Brazilians do worse on reef break and better on point and beach break beaches. We now assess the null that Brazilians perform equally well on all types of beaches. To that end, we create a reified surfer, "Brazilian_beachtype", whose strength helps Brazilian surfers when surfing on their (putative) preferred beachtype.
surfing2 <- hyper2() for(v in o){ venue <- v[1] # first entry is venue r <- v[-1] numberretained <- sum(!(r %in% remove)) if(numberretained > 1){ r <- r[!(r %in% remove)] jj <- race(r) surfers <- pnames(jj) if(sum(nationality(surfers)=="BRA")==1){ if((beachtype(venue)=="point") | (beachtype(venue) == "beach")){ jj %<>% pwa(surfers[nationality(surfers)=="BRA"],"Brazilian_beachtype") } } surfing2 %<>% sum(jj) } } summary(surfing2)
(surfing2_maxp <- maxp(surfing2)) specificp.gt.test(surfing2,"Brazilian_beachtype",0)
p <- seq(from=0.001,to=0.01,by=0.0005) l <- profsupp(surfing2,"Brazilian_beachtype",p,relative=FALSE) l0 <- loglik(indep(surfing_maxp),surfing)
plot(c(0,p),c(l0,l)-max(l),type='b',xlab="Brazilian beachtype",ylab="support")
p_wide <- seq(from=0.002,to=0.04,by=0.002) l_wide <- profsupp(surfing2,"Brazilian_beachtype",p_wide,relative=FALSE)
plot(c(0,p_wide),c(l0,l_wide)-max(l),type='b',xlab="Brazilian beachtype wide",ylab="support") abline(h= -2,lty=2) pdf(file="plotwidep.pdf") plot(c(0,p_wide),c(l0,l_wide)-max(l),type='b',xlab="Bradley-Terry strength of the Brazilian beachtype reified entity",ylab="Support") abline(h= -2,lty=2) dev.off()
all_Brazilian_surfers <- rownames(surfing_table)[surfing_table$nation == "BRA"] all_Brazilian_surfers pnames(surfing2) pnames(surfing2) %in% all_Brazilian_surfers table(pnames(surfing2) %in% all_Brazilian_surfers) Brazilian_surfers <- pnames(surfing2)[pnames(surfing2) %in% all_Brazilian_surfers] Brazilian_surfers
f <- function(B,...){ jj <- specificp.test(surfing2, "Brazilian_beachtype", B) mp <- jj$null_estimate n <- sum(names(mp) %in% all_Brazilian_surfers) GLO <- log(mp["Brazilian_beachtype"]^n/prod(mp[which(names(mp) %in% all_Brazilian_surfers)])) out <- c(B,GLO, support=jj$null_support) names(out) <- c("B","GLO","support") return(out) }
braz <- c(0.0003,0.0005,0.0008, 0.001,0.0015,0.002,0.003,0.004,0.005,0.007,0.01,0.015,0.02,0.03,0.05) M <- do.call("rbind",lapply(braz,f)) M[,3] <- M[,3]-max(M[,3]) M
plot(M[,2],M[,3]-max(M[,3]),type='b',xlab="Brazilian beachtype (log-odds)",ylab="support") plot(M[,2],M[,3]-max(M[,3]),type='b',ylim=c(-0.3,0),xlab="Brazilian beachtype (log-odds)",ylab="support")
scomp <- hyper2() scomp3 <- hyper2() # scomp3 is the likelihood function for just the races with 3 competitors, 2 of one nationality and one odd-man-out. for(v in o){ r <- v[-1] # first entry is venue; "r" for "race" numberretained <- sum(!(r %in% remove)) if(numberretained > 1){ r <- r[!(r %in% remove)] if(numberretained ==3){ nationalities <- c(nationality(r[1]),nationality(r[2]),nationality(r[3])) if(length(unique(nationalities)) == 2){ if(sum(nationalities %in% nationality(r[1])) == 1){ # winner is the odd one out Hjj <- pwa(race(r),pwa=r[1]) print("odd one out is number 1") print(100*surfing_maxp[r]) } else if(sum(nationalities %in% nationality(r[2]) ) == 1){ # second best is the odd one out Hjj <- pwa(race(r),pwa=r[2]) print("odd one out is number 2") print(100*surfing_maxp[r]) } else if(sum(nationalities %in% nationality(r[3])) == 1){ # third best is the odd one out Hjj <- pwa(race(r),pwa=r[3]) print("odd one out is number 3") print(100*surfing_maxp[r]) } else { stop("this cannot happen") } print(Hjj) print("---") scomp <- scomp + Hjj # add compatriot-monster modified likelihood scomp3 <- scomp3 + Hjj } else { # 3 in race, but no compatriot monster... scomp <- scomp + race(r) # ... so use regular Plackett-Luce for 3-competitor race } } else { # if(numberretained==3) closes... scomp <- scomp + race(r) # numberretained !=3; use regular Plackett-Luce } } # if(numberretained>1) closes } # for(v in o) loop closes
maxs <- maxp(scomp) maxs
pie(maxs)
specificp.gt.test(scomp,"S",0,n=100,maxtry=100)
Now profile likelihood:
p <- c(0.0002,seq(from=0.001,to=0.02,len=10)) l <- profsupp(scomp,"S",p,relative=FALSE)
plot it:
plot(p,l,type='b',xlab="compatriot support",ylab="support",main="profile support for S, scomp")
scomp3 maxp(scomp3) specificp.gt.test(scomp3,'S',0,n=100)
Now profile likelihood:
p <- c(0.0002,seq(from=0.001,to=0.02,len=10)) l <- profsupp(scomp3,"S",p,relative=FALSE,n=100)
plot it:
plot(p,l,type='b',xlab="compatriot support",ylab="support",main="profile support for S, scomp3")
We can do a much more basic test on just the entries with three competitors comprising two of one nationality and one of a different nationality:
## Dora Ferreira Slater ## 2.5741 12.5929 3.8933 (3) ## Smith Buchan Freestone ## 5.2764 2.0552 1.8673 (1) ## Slater Andino Callinan ## 3.8933 5.9190 1.8408 (3) ## Crisanto Wilson Ibelli ## 2.1644 2.7100 2.2479 (2) ## Moniz Wright Freestone ## 3.0250 3.6502 1.8673 (1) ## Carmichael Freestone Flores ## 1.8855 1.8673 2.0480 (3) ## Bourez Wright Freestone ## 1.9068 3.6502 1.8673 (1) ## Colapinto Flores Bourez ## 2.9871 2.0480 1.9068 (1) ## Ferreira Morais Ibelli ## 12.5929 1.0817 2.2479 (2) ## Freestone Flores Bourez ## 1.8673 2.0480 1.9068 (1) ## Dora Ferreira Morais ## 2.5741 12.5929 1.0817 (1) M <- matrix(c( 2.5741,12.5929,3.8933,3 , 5.2764, 2.0552,1.8673,1 , 3.8933, 5.9190,1.8408,3 , 2.1644, 2.7100,2.2479,2 , 3.0250, 3.6502,1.8673,1 , 1.8855, 1.8673,2.0480,3 , 1.9068, 3.6502,1.8673,1 , 2.9871, 2.0480,1.9068,1 ,12.5929, 1.0817,2.2479,2 , 1.8673, 2.0480,1.9068,1 , 2.5741,12.5929,1.0817,1),byrow=TRUE, ncol=4) for(i in seq_len(nrow(M))){ M[i,1:3] <- M[i,1:3]/sum(M[i,1:3])} print(M) # rows 1-3 of M are strengths of the first, second, third # competitors. Row 4 is the place of the odd-man-out. Thus # on row 1, the odd-man-out came third, and on row 2, the # odd-man-out came first. p <- M[cbind(1:11, M[,4])] print(p) ## element i of p is the probability of the odd-man-out ## winning, given the maximum likelihood estimate. ww <-table(replicate(1e5,sum(rbinom(11,1,p)))) print(ww) sum(ww[7:10])/sum(ww) # pvalue for observation of six wins by the odd-man-out
Following lines create surfing.rda
, residing in the data/
directory
of the package.
save(surfing,surfing_maxp,surfing_venuetypes,surfing_table,file="surfing.rda")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.