defaultW <- getOption("warn") options(warn = -1) library("hyper2",quietly=TRUE) options(warn = defaultW)
knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))
To cite the hyper2
package in publications, please use @hankin2017_rmd
(takes about fifteen minutes to run without cache). Here I analyse a
small synthetic dataset for home ground advantage, then go on to
discuss a dataset presented by Agresti. But first, the smallest
nontrivial example is a 2x2 case:
M <- matrix(c( NA, 1+5i, 3+11i, NA), nrow=2,ncol=2,byrow=TRUE) teams <- LETTERS[seq_len(2)] dimnames(M) <- list("@home" = teams,"@away"=teams) dimnames(M) <- list("@home" = teams,"@away"=teams) M home_away(M)
Above we see (from M[1,2]
) that A played at home $1+5=6$ times,
winning 1 and losing 5; and B played at home $3+11=14$ times, winning
3 and losing 11. A slightly more complicated example:
home_games_won <- matrix(c( NA, 16, 12, 11, 19, NA, 19, 16, 17, 12, NA, 11, 11, 12, 12, NA), nrow=4,ncol=4,byrow=TRUE) away_games_won <- matrix(c( NA, 05, 02, 02, 9, NA, 10, 02, 3, 04, NA, 07, 8, 06, 04, NA), nrow=4,ncol=4,byrow=TRUE) teams <- LETTERS[1:4] dimnames(home_games_won) <- list("@home" = teams,"@away"=teams) dimnames(away_games_won) <- list("@home" = teams,"@away"=teams) home_games_won away_games_won
Thus home_games_won[1,2] == 16
means A played at home against B and
won 16 times; home_games_won[2,1] == 19
means B played at home
against A and won 19 times. Also, away_games_won[1,2] == 5
means A
played away against B and won 5 times, and away_games_won[2,1] == 2
means B played away against A and won 2 times. Alternatively, A
played B 16+19+5+2=45 times; A played at home 16+2=18 times and won 16
times and lost 2 times; and B played at home 19+5=24 times and won 19
times and lost 5 times. Alternatively we may use complex numbers to
represent the same dataset:
home_games_won + 1i*away_games_won
We will create a hyper2 object with the teams and home ground advantage term:
H <- home_away(home_games_won,away_games_won) H options("use_alabama" = FALSE) # to avoid the stupid wmmin bug specificp.gt.test(H,"home",0) options("use_alabama" = TRUE) # to avoid the stupid wmmin bug
We see strong evidence to support the contention that home advantage is real. Further, we may test the hypothesis that all the teams have the same strength, after accounting for the home team advantage:
samep.test(H,teams)
Above, we see no evidence for a difference in team strengths. Visually:
(mt <- maxp(H))
A profile likelihood diagram:
home_strength <- seq(from=0.15,to=0.45,length=13) plot(home_strength,profsupp(H,"home",home_strength),type="b",pch=16, xlab="home strength",ylab="support",main="profile likelihood for home advantage") abline(h=c(0,-2))
The figure give a support interval between about 0.3 and 0.5 for home ground advantage.
Agresti considers a dataset of seven baseball teams who play one another repeatedly. Each game is played at the home ground of one team, the other playing away. The dataset is as follows:
baseball_table <- matrix(c( NA, 4+3i, 4+2i, 4+3i, 6+1i, 4+2i, 6+0i, 3+3i, NA , 4+2i, 4+3i, 6+0i, 6+1i, 4+3i, 2+5i, 4+3i, NA , 2+4i, 4+3i, 4+2i, 6+0i, 3+3i, 5+1i, 2+5i, NA , 4+3i, 4+2i, 6+1i, 5+1i, 2+5i, 3+3i, 4+2i, NA , 5+2i, 6+0i, 2+5i, 3+3i, 3+4i, 4+3i, 4+2i, NA , 2+4i, 2+5i, 1+5i, 1+6i, 2+4i, 1+6i, 3+4i, NA),7,7) teams <- c("Milwaukee", "Detroit", "Toronto", "New York", "Boston","Cleveland","Baltimore") rownames(baseball_table) <- teams colnames(baseball_table) <- teams M <- baseball_table # saves typing
Above, we represent home team wins with the real part of the matrix, the imaginary component being away wins. Now process it:
baseball <- home_away(M) baseball
Maximum likelihood:
baseball_maxp <- maxp(baseball) baseball_maxp
Visually:
dotchart(baseball_maxp,pch=16) pie(baseball_maxp)
Above, we see that the teams are all of roughly equal strength, and the home advantage is small. However, we may easily assess the null that the home strength is zero:
specificp.gt.test(baseball,"home",0)
home_strength <- seq(from=0.005,to=0.1,length=20) plot(home_strength,profsupp(baseball,"home",home_strength),type="b",pch=16, xlab="home strength",ylab="support",main="profile likelihood for home advantage") abline(h=c(0,-2))
Further, we may test the hypothesis that the baseball teams have the same strength:
samep.test(baseball,teams)
Above, we see no evidence for the teams having differential strengths.
As an interesting aside, suppose we have some additional data from games in which (for some reason), the home advantage was not operating.
M2 <- matrix(c(0,1,2,0),2,2) dimnames(M2) <- list(winner=teams[1:2],loser=teams[1:2]) M2
Thus we have additional data from three games in which Milwaukee won twice and lost once. This additional information may easily be incorporated into our likelihood function, at least on the assumption of independence:
baseball <- baseball + pairwise(M2)
We may assess the difference that the new information makes:
ordertransplot(baseball_maxp[1:7],maxp(baseball)[1:7],xlab="old",ylab="new",plotlims=c(0.01,0.16))
Above, we see very little difference in the two evaluates, except for Detroit and Milwaukee, as expected.
Considering the simplest case of two teams, a and b, with results:
Matching the probabilities for $a$ at home gives us $H=a(\lambda-1)$ and matching the probabilities for $a$ away gives $H=b(\lambda-1)$. These cannot both be true unless $a=b$ or $\lambda=1$. But we can, in principle, test which one is a better model.
We may apply hyper3
idiom to the dataset, using the likelihood
function of Davidson and Beaver (1977). First a formal test:
baseball_table f <- function(l){ maxp(home_away3(baseball_table,l),give=TRUE,n=1)$likes} o <- optimize(f,c(0.8,1.6),maximum=TRUE) o maxL <- f(o$maximum) W <- maxL - f(1) # f(1) being the likelihood of the null W pchisq(2*W,df=1,lower.tail=FALSE)
The pvalue is significant. Now plot a profile likelihood (support) curve:
jj1 <- seq(from=log(o$maximum),to=log(1.9),len=7) jj2 <- seq(from=log(o$maximum),to=log(0.8),len=7) l <- sort(unique(exp(c(jj1,jj2)))) lam <- sort(c(l,o$maximum)) L <- sapply(lam,f) - maxL
plot(log(lam),L,type="b") abline(h=c(0,-2)) abline(v=0) plot(lam,L,type="b") abline(h=c(0,-2))
Following lines create baseball.rda
, residing in the data/
directory of the package.
save(baseball_table, baseball, baseball_maxp, file="baseball.rda")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.