defaultW <- getOption("warn") 
options(warn = -1) 
library("hyper2",quietly=TRUE)
options(warn = defaultW)
knitr::include_graphics(system.file("help/figures/hyper2.png", package = "hyper2"))

To cite the hyper2 package in publications, please use @hankin2017_rmd (takes about fifteen minutes to run without cache). Here I analyse a small synthetic dataset for home ground advantage, then go on to discuss a dataset presented by Agresti. But first, the smallest nontrivial example is a 2x2 case:

M <- matrix(c(
    NA, 1+5i, 3+11i, NA),
    nrow=2,ncol=2,byrow=TRUE)

teams <- LETTERS[seq_len(2)]
dimnames(M) <- list("@home" = teams,"@away"=teams)
dimnames(M) <- list("@home" = teams,"@away"=teams)
M
home_away(M)

Above we see (from M[1,2]) that A played at home $1+5=6$ times, winning 1 and losing 5; and B played at home $3+11=14$ times, winning 3 and losing 11. A slightly more complicated example:

home_games_won <- matrix(c(
    NA, 16, 12, 11,
    19, NA, 19, 16,
    17, 12, NA, 11,
    11, 12, 12, NA),
    nrow=4,ncol=4,byrow=TRUE)

away_games_won <- matrix(c(
    NA, 05, 02, 02,
     9, NA, 10, 02,
     3, 04, NA, 07,
     8, 06, 04, NA),
    nrow=4,ncol=4,byrow=TRUE)

teams <- LETTERS[1:4]
dimnames(home_games_won) <- list("@home" = teams,"@away"=teams)
dimnames(away_games_won) <- list("@home" = teams,"@away"=teams)

home_games_won
away_games_won

Thus home_games_won[1,2] == 16 means A played at home against B and won 16 times; home_games_won[2,1] == 19 means B played at home against A and won 19 times. Also, away_games_won[1,2] == 5 means A played away against B and won 5 times, and away_games_won[2,1] == 2 means B played away against A and won 2 times. Alternatively, A played B 16+19+5+2=45 times; A played at home 16+2=18 times and won 16 times and lost 2 times; and B played at home 19+5=24 times and won 19 times and lost 5 times. Alternatively we may use complex numbers to represent the same dataset:

home_games_won + 1i*away_games_won

We will create a hyper2 object with the teams and home ground advantage term:

H <- home_away(home_games_won,away_games_won)
H
options("use_alabama" = FALSE) # to avoid the stupid wmmin bug
specificp.gt.test(H,"home",0)
options("use_alabama" = TRUE) # to avoid the stupid wmmin bug

We see strong evidence to support the contention that home advantage is real. Further, we may test the hypothesis that all the teams have the same strength, after accounting for the home team advantage:

samep.test(H,teams)

Above, we see no evidence for a difference in team strengths. Visually:

(mt <- maxp(H))

A profile likelihood diagram:

home_strength <- seq(from=0.15,to=0.45,length=13)
plot(home_strength,profsupp(H,"home",home_strength),type="b",pch=16,
     xlab="home strength",ylab="support",main="profile likelihood for home advantage")
abline(h=c(0,-2))

The figure give a support interval between about 0.3 and 0.5 for home ground advantage.

Example in Agresti

Agresti considers a dataset of seven baseball teams who play one another repeatedly. Each game is played at the home ground of one team, the other playing away. The dataset is as follows:

baseball_table <- matrix(c(
NA,   4+3i, 4+2i, 4+3i, 6+1i, 4+2i, 6+0i,
3+3i, NA  , 4+2i, 4+3i, 6+0i, 6+1i, 4+3i,
2+5i, 4+3i, NA  , 2+4i, 4+3i, 4+2i, 6+0i,
3+3i, 5+1i, 2+5i, NA  , 4+3i, 4+2i, 6+1i,
5+1i, 2+5i, 3+3i, 4+2i, NA  , 5+2i, 6+0i,
2+5i, 3+3i, 3+4i, 4+3i, 4+2i, NA  , 2+4i,
2+5i, 1+5i, 1+6i, 2+4i, 1+6i, 3+4i, NA),7,7)
teams <- c("Milwaukee", "Detroit", "Toronto", "New York", "Boston","Cleveland","Baltimore")
rownames(baseball_table) <- teams
colnames(baseball_table) <- teams
M <- baseball_table # saves typing

Above, we represent home team wins with the real part of the matrix, the imaginary component being away wins. Now process it:

baseball <- home_away(M)
baseball

Maximum likelihood:

baseball_maxp <- maxp(baseball)
baseball_maxp

Visually:

dotchart(baseball_maxp,pch=16)
pie(baseball_maxp)

Above, we see that the teams are all of roughly equal strength, and the home advantage is small. However, we may easily assess the null that the home strength is zero:

specificp.gt.test(baseball,"home",0)
home_strength <- seq(from=0.005,to=0.1,length=20)
plot(home_strength,profsupp(baseball,"home",home_strength),type="b",pch=16,
     xlab="home strength",ylab="support",main="profile likelihood for home advantage")
abline(h=c(0,-2))

Further, we may test the hypothesis that the baseball teams have the same strength:

samep.test(baseball,teams)

Above, we see no evidence for the teams having differential strengths.

As an interesting aside, suppose we have some additional data from games in which (for some reason), the home advantage was not operating.

M2 <- matrix(c(0,1,2,0),2,2)
dimnames(M2) <- list(winner=teams[1:2],loser=teams[1:2])
M2

Thus we have additional data from three games in which Milwaukee won twice and lost once. This additional information may easily be incorporated into our likelihood function, at least on the assumption of independence:

baseball <- baseball + pairwise(M2)

We may assess the difference that the new information makes:

ordertransplot(baseball_maxp[1:7],maxp(baseball)[1:7],xlab="old",ylab="new",plotlims=c(0.01,0.16))

Above, we see very little difference in the two evaluates, except for Detroit and Milwaukee, as expected.

Some further thoughts on the home ground monster and Agresti's analysis.

Considering the simplest case of two teams, a and b, with results:

Matching the probabilities for $a$ at home gives us $H=a(\lambda-1)$ and matching the probabilities for $a$ away gives $H=b(\lambda-1)$. These cannot both be true unless $a=b$ or $\lambda=1$. But we can, in principle, test which one is a better model.

hyper3 idiom

We may apply hyper3 idiom to the dataset, using the likelihood function of Davidson and Beaver (1977). First a formal test:

baseball_table
f <- function(l){ maxp(home_away3(baseball_table,l),give=TRUE,n=1)$likes}
o <- optimize(f,c(0.8,1.6),maximum=TRUE)
o
maxL <- f(o$maximum)
W <- maxL - f(1)    # f(1) being the likelihood of the null
W
pchisq(2*W,df=1,lower.tail=FALSE)

The pvalue is significant. Now plot a profile likelihood (support) curve:

jj1  <- seq(from=log(o$maximum),to=log(1.9),len=7)
jj2  <- seq(from=log(o$maximum),to=log(0.8),len=7)
l <- sort(unique(exp(c(jj1,jj2))))
lam  <- sort(c(l,o$maximum))
L <- sapply(lam,f) - maxL
plot(log(lam),L,type="b")
abline(h=c(0,-2))
abline(v=0)

plot(lam,L,type="b")
abline(h=c(0,-2))

References

Package dataset {-}

Following lines create baseball.rda, residing in the data/ directory of the package.

save(baseball_table, baseball, baseball_maxp, 
  file="baseball.rda")


RobinHankin/hyper2 documentation built on May 6, 2024, 4:25 p.m.